Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations
Currently, deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. Often these results are achieved at the expense of huge computational costs and require an incredible number of episodes of interaction between the agent and the environment. There are tw...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Currently, deep reinforcement learning (RL) shows impressive results in
complex gaming and robotic environments. Often these results are achieved at
the expense of huge computational costs and require an incredible number of
episodes of interaction between the agent and the environment. There are two
main approaches to improving the sample efficiency of reinforcement learning
methods - using hierarchical methods and expert demonstrations. In this paper,
we propose a combination of these approaches that allow the agent to use
low-quality demonstrations in complex vision-based environments with multiple
related goals. Our forgetful experience replay (ForgER) algorithm effectively
handles errors in expert data and reduces quality losses when adapting the
action space and states representation to the agent's capabilities. Our
proposed goal-oriented structuring of replay buffer allows the agent to
automatically highlight sub-goals for solving complex hierarchical tasks in
demonstrations. Our method is universal and can be integrated into various
off-policy methods. It surpasses all known existing state-of-the-art RL methods
using expert demonstrations on various model environments. The solution based
on our algorithm beats all the solutions for the famous MineRL competition and
allows the agent to mine a diamond in the Minecraft environment. |
---|---|
DOI: | 10.48550/arxiv.2006.09939 |