The Challenges of Exploration for Offline Reinforcement Learning
Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour. The second step has been widely studied in the offline setting, but just as critical to data-efficient RL is...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Offline Reinforcement Learning (ORL) enablesus to separately study the two
interlinked processes of reinforcement learning: collecting informative
experience and inferring optimal behaviour. The second step has been widely
studied in the offline setting, but just as critical to data-efficient RL is
the collection of informative data. The task-agnostic setting for data
collection, where the task is not known a priori, is of particular interest due
to the possibility of collecting a single dataset and using it to solve several
downstream tasks as they arise. We investigate this setting via curiosity-based
intrinsic motivation, a family of exploration methods which encourage the agent
to explore those states or transitions it has not yet learned to model. With
Explore2Offline, we propose to evaluate the quality of collected data by
transferring the collected data and inferring policies with reward relabelling
and standard offline RL algorithms. We evaluate a wide variety of data
collection strategies, including a new exploration agent, Intrinsic Model
Predictive Control (IMPC), using this scheme and demonstrate their performance
on various tasks. We use this decoupled framework to strengthen intuitions
about exploration and the data prerequisites for effective offline RL. |
---|---|
DOI: | 10.48550/arxiv.2201.11861 |