Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs
Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising paradigm to address the exploration-exploitation dilemma in reinforcement learning. It decomposes the source task into subgoal conditional subtasks and conducts exploration and exploitation in the subgoal space. The effectiv...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising
paradigm to address the exploration-exploitation dilemma in reinforcement
learning. It decomposes the source task into subgoal conditional subtasks and
conducts exploration and exploitation in the subgoal space. The effectiveness
of GCHRL heavily relies on subgoal representation functions and subgoal
selection strategy. However, existing works often overlook the temporal
coherence in GCHRL when learning latent subgoal representations and lack an
efficient subgoal selection strategy that balances exploration and
exploitation. This paper proposes HIerarchical reinforcement learning via
dynamically building Latent Landmark graphs (HILL) to overcome these
limitations. HILL learns latent subgoal representations that satisfy temporal
coherence using a contrastive representation learning objective. Based on these
representations, HILL dynamically builds latent landmark graphs and employs a
novelty measure on nodes and a utility measure on edges. Finally, HILL develops
a subgoal selection strategy that balances exploration and exploitation by
jointly considering both measures. Experimental results demonstrate that HILL
outperforms state-of-the-art baselines on continuous control tasks with sparse
rewards in sample efficiency and asymptotic performance. Our code is available
at https://github.com/papercode2022/HILL. |
---|---|
DOI: | 10.48550/arxiv.2307.12063 |