Replay across Experiments: A Natural Extension of Off-Policy RL
Replaying data is a principal mechanism underlying the stability and data efficiency of off-policy reinforcement learning (RL). We present an effective yet simple framework to extend the use of replays across multiple experiments, minimally adapting the RL workflow for sizeable improvements in contr...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Replaying data is a principal mechanism underlying the stability and data
efficiency of off-policy reinforcement learning (RL). We present an effective
yet simple framework to extend the use of replays across multiple experiments,
minimally adapting the RL workflow for sizeable improvements in controller
performance and research iteration times. At its core, Replay Across
Experiments (RaE) involves reusing experience from previous experiments to
improve exploration and bootstrap learning while reducing required changes to a
minimum in comparison to prior work. We empirically show benefits across a
number of RL algorithms and challenging control domains spanning both
locomotion and manipulation, including hard exploration tasks from egocentric
vision. Through comprehensive ablations, we demonstrate robustness to the
quality and amount of data available and various hyperparameter choices.
Finally, we discuss how our approach can be applied more broadly across
research life cycles and can increase resilience by reloading data across
random seeds or hyperparameter variations. |
---|---|
DOI: | 10.48550/arxiv.2311.15951 |