Learning to Discover Skills through Guidance
In the field of unsupervised skill discovery (USD), a major challenge is limited exploration, primarily due to substantial penalties when skills deviate from their initial trajectories. To enhance exploration, recent methodologies employ auxiliary rewards to maximize the epistemic uncertainty or ent...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In the field of unsupervised skill discovery (USD), a major challenge is
limited exploration, primarily due to substantial penalties when skills deviate
from their initial trajectories. To enhance exploration, recent methodologies
employ auxiliary rewards to maximize the epistemic uncertainty or entropy of
states. However, we have identified that the effectiveness of these rewards
declines as the environmental complexity rises. Therefore, we present a novel
USD algorithm, skill discovery with guidance (DISCO-DANCE), which (1) selects
the guide skill that possesses the highest potential to reach unexplored
states, (2) guides other skills to follow guide skill, then (3) the guided
skills are dispersed to maximize their discriminability in unexplored states.
Empirical evaluation demonstrates that DISCO-DANCE outperforms other USD
baselines in challenging environments, including two navigation benchmarks and
a continuous control benchmark. Qualitative visualizations and code of
DISCO-DANCE are available at https://mynsng.github.io/discodance. |
---|---|
DOI: | 10.48550/arxiv.2310.20178 |