Automated curricula through setter-solver interactions
International Conference on Learning Representations, 2020 Reinforcement learning algorithms use correlations between policies and rewards to improve agent performance. But in dynamic or sparsely rewarding environments these correlations are often too small, or rewarding events are too infrequent to...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | International Conference on Learning Representations, 2020 Reinforcement learning algorithms use correlations between policies and
rewards to improve agent performance. But in dynamic or sparsely rewarding
environments these correlations are often too small, or rewarding events are
too infrequent to make learning feasible. Human education instead relies on
curricula--the breakdown of tasks into simpler, static challenges with dense
rewards--to build up to complex behaviors. While curricula are also useful for
artificial agents, hand-crafting them is time consuming. This has lead
researchers to explore automatic curriculum generation. Here we explore
automatic curriculum generation in rich, dynamic environments. Using a
setter-solver paradigm we show the importance of considering goal validity,
goal feasibility, and goal coverage to construct useful curricula. We
demonstrate the success of our approach in rich but sparsely rewarding 2D and
3D environments, where an agent is tasked to achieve a single goal selected
from a set of possible goals that varies between episodes, and identify
challenges for future work. Finally, we demonstrate the value of a novel
technique that guides agents towards a desired goal distribution. Altogether,
these results represent a substantial step towards applying automatic task
curricula to learn complex, otherwise unlearnable goals, and to our knowledge
are the first to demonstrate automated curriculum generation for
goal-conditioned agents in environments where the possible goals vary between
episodes. |
---|---|
DOI: | 10.48550/arxiv.1909.12892 |