Compact Goal Representation Learning via Information Bottleneck in Goal-Conditioned Reinforcement Learning

We propose an Information bottleneck (IB) for Goal representation learning (InfoGoal), a self-supervised method for generalizable goal-conditioned reinforcement learning (RL). Goal-conditioned RL learns a policy from reward signals to predict actions for reaching desired goals. However, the policy w...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2024-01, Vol.PP, p.1-14
Hauptverfasser: Zou, Qiming, Suzuki, Einoshin
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We propose an Information bottleneck (IB) for Goal representation learning (InfoGoal), a self-supervised method for generalizable goal-conditioned reinforcement learning (RL). Goal-conditioned RL learns a policy from reward signals to predict actions for reaching desired goals. However, the policy would overfit the task-irrelevant information contained in the goal and may be falsely or ineffectively generalized to reach other goals. A goal representation containing sufficient task-relevant information and minimum task-irrelevant information is guaranteed to reduce generalization errors. However, in goal-conditioned RL, it is difficult to balance the tradeoff between task-relevant information and task-irrelevant information because of the sparse and delayed learning signals, i.e., reward signals, and the inevitable task-relevant information sacrifice caused by information compression. Our InfoGoal learns a minimum and sufficient goal representation with dense and immediate self-supervised learning signals. Meanwhile, InfoGoal adaptively adjusts the weight of information minimization to achieve maximum information compression with a reasonable sacrifice of task-relevant information. Consequently, InfoGoal enables policy to generate a targeted trajectory toward states where the desired goal can be found with high probability and broadly explores those states. We conduct experiments on both simulated and real-world tasks, and our method significantly outperforms baseline methods in terms of policy optimality and the success rate of reaching unseen test goals. Video demos are available at infogoal.github.io.
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2023.3344880