State-Conditioned Adversarial Subgoal Generation
Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks by performing decision-making and control at successively higher levels of temporal abstraction. However, off-policy HRL often suffers from the problem of a non-stationary high-level policy since the low-level policy is cons...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks
by performing decision-making and control at successively higher levels of
temporal abstraction. However, off-policy HRL often suffers from the problem of
a non-stationary high-level policy since the low-level policy is constantly
changing. In this paper, we propose a novel HRL approach for mitigating the
non-stationarity by adversarially enforcing the high-level policy to generate
subgoals compatible with the current instantiation of the low-level policy. In
practice, the adversarial learning is implemented by training a simple
state-conditioned discriminator network concurrently with the high-level policy
which determines the compatibility level of subgoals. Comparison to
state-of-the-art algorithms shows that our approach improves both learning
efficiency and performance in challenging continuous control tasks. |
---|---|
DOI: | 10.48550/arxiv.2201.09635 |