TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations
Unsupervised goal-conditioned reinforcement learning (GCRL) is a promising paradigm for developing diverse robotic skills without external supervision. However, existing unsupervised GCRL methods often struggle to cover a wide range of states in complex environments due to their limited exploration...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Unsupervised goal-conditioned reinforcement learning (GCRL) is a promising
paradigm for developing diverse robotic skills without external supervision.
However, existing unsupervised GCRL methods often struggle to cover a wide
range of states in complex environments due to their limited exploration and
sparse or noisy rewards for GCRL. To overcome these challenges, we propose a
novel unsupervised GCRL method that leverages TemporaL Distance-aware
Representations (TLDR). Based on temporal distance, TLDR selects faraway goals
to initiate exploration and computes intrinsic exploration rewards and
goal-reaching rewards. Specifically, our exploration policy seeks states with
large temporal distances (i.e. covering a large state space), while the
goal-conditioned policy learns to minimize the temporal distance to the goal
(i.e. reaching the goal). Our results in six simulated locomotion environments
demonstrate that TLDR significantly outperforms prior unsupervised GCRL methods
in achieving a wide range of states. |
---|---|
DOI: | 10.48550/arxiv.2407.08464 |