DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors
Learning from previously collected data via behavioral cloning or offline reinforcement learning (RL) is a powerful recipe for scaling generalist agents by avoiding the need for expensive online learning. Despite strong generalization in some respects, agents are often remarkably brittle to minor vi...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Learning from previously collected data via behavioral cloning or offline
reinforcement learning (RL) is a powerful recipe for scaling generalist agents
by avoiding the need for expensive online learning. Despite strong
generalization in some respects, agents are often remarkably brittle to minor
visual variations in control-irrelevant factors such as the background or
camera viewpoint. In this paper, we present theDeepMind Control Visual
Benchmark (DMC-VB), a dataset collected in the DeepMind Control Suite to
evaluate the robustness of offline RL agents for solving continuous control
tasks from visual input in the presence of visual distractors. In contrast to
prior works, our dataset (a) combines locomotion and navigation tasks of
varying difficulties, (b) includes static and dynamic visual variations, (c)
considers data generated by policies with different skill levels, (d)
systematically returns pairs of state and pixel observation, (e) is an order of
magnitude larger, and (f) includes tasks with hidden goals. Accompanying our
dataset, we propose three benchmarks to evaluate representation learning
methods for pretraining, and carry out experiments on several recently proposed
methods. First, we find that pretrained representations do not help policy
learning on DMC-VB, and we highlight a large representation gap between
policies learned on pixel observations and on states. Second, we demonstrate
when expert data is limited, policy learning can benefit from representations
pretrained on (a) suboptimal data, and (b) tasks with stochastic hidden goals.
Our dataset and benchmark code to train and evaluate agents are available at:
https://github.com/google-deepmind/dmc_vision_benchmark. |
---|---|
DOI: | 10.48550/arxiv.2409.18330 |