Compact Goal Representation Learning via Information Bottleneck in Goal-Conditioned Reinforcement Learning
We propose an Information bottleneck (IB) for Goal representation learning (InfoGoal), a self-supervised method for generalizable goal-conditioned reinforcement learning (RL). Goal-conditioned RL learns a policy from reward signals to predict actions for reaching desired goals. However, the policy w...
Gespeichert in:
Veröffentlicht in: | IEEE transaction on neural networks and learning systems 2024-01, Vol.PP, p.1-14 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose an Information bottleneck (IB) for Goal representation learning (InfoGoal), a self-supervised method for generalizable goal-conditioned reinforcement learning (RL). Goal-conditioned RL learns a policy from reward signals to predict actions for reaching desired goals. However, the policy would overfit the task-irrelevant information contained in the goal and may be falsely or ineffectively generalized to reach other goals. A goal representation containing sufficient task-relevant information and minimum task-irrelevant information is guaranteed to reduce generalization errors. However, in goal-conditioned RL, it is difficult to balance the tradeoff between task-relevant information and task-irrelevant information because of the sparse and delayed learning signals, i.e., reward signals, and the inevitable task-relevant information sacrifice caused by information compression. Our InfoGoal learns a minimum and sufficient goal representation with dense and immediate self-supervised learning signals. Meanwhile, InfoGoal adaptively adjusts the weight of information minimization to achieve maximum information compression with a reasonable sacrifice of task-relevant information. Consequently, InfoGoal enables policy to generate a targeted trajectory toward states where the desired goal can be found with high probability and broadly explores those states. We conduct experiments on both simulated and real-world tasks, and our method significantly outperforms baseline methods in terms of policy optimality and the success rate of reaching unseen test goals. Video demos are available at infogoal.github.io. |
---|---|
ISSN: | 2162-237X 2162-2388 |
DOI: | 10.1109/TNNLS.2023.3344880 |