Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed but require large amounts of interactions between the agent and the environment. To alleviate the issue, unsupervised RL proposes to employ self-supervised interaction and le...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Controlling artificial agents from visual sensory data is an arduous task.
Reinforcement learning (RL) algorithms can succeed but require large amounts of
interactions between the agent and the environment. To alleviate the issue,
unsupervised RL proposes to employ self-supervised interaction and learning,
for adapting faster to future tasks. Yet, as shown in the Unsupervised RL
Benchmark (URLB; Laskin et al. 2021), whether current unsupervised strategies
can improve generalization capabilities is still unclear, especially in visual
control settings. In this work, we study the URLB and propose a new method to
solve it, using unsupervised model-based RL, for pre-training the agent, and a
task-aware fine-tuning strategy combined with a new proposed hybrid planner,
Dyna-MPC, to adapt the agent for downstream tasks. On URLB, our method obtains
93.59% overall normalized performance, surpassing previous baselines by a
staggering margin. The approach is empirically evaluated through a large-scale
empirical study, which we use to validate our design choices and analyze our
models. We also show robust performance on the Real-Word RL benchmark, hinting
at resiliency to environment perturbations during adaptation. Project website:
https://masteringurlb.github.io/ |
---|---|
DOI: | 10.48550/arxiv.2209.12016 |