Imitating Graph-Based Planning with Goal-Conditioned Policies
Recently, graph-based planning algorithms have gained much attention to solve goal-conditioned reinforcement learning (RL) tasks: they provide a sequence of subgoals to reach the target-goal, and the agents learn to execute subgoal-conditioned policies. However, the sample-efficiency of such RL sche...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, graph-based planning algorithms have gained much attention to solve
goal-conditioned reinforcement learning (RL) tasks: they provide a sequence of
subgoals to reach the target-goal, and the agents learn to execute
subgoal-conditioned policies. However, the sample-efficiency of such RL schemes
still remains a challenge, particularly for long-horizon tasks. To address this
issue, we present a simple yet effective self-imitation scheme which distills a
subgoal-conditioned policy into the target-goal-conditioned policy. Our
intuition here is that to reach a target-goal, an agent should pass through a
subgoal, so target-goal- and subgoal- conditioned policies should be similar to
each other. We also propose a novel scheme of stochastically skipping executed
subgoals in a planned path, which further improves performance. Unlike prior
methods that only utilize graph-based planning in an execution phase, our
method transfers knowledge from a planner along with a graph into policy
learning. We empirically show that our method can significantly boost the
sample-efficiency of the existing goal-conditioned RL methods under various
long-horizon control tasks. |
---|---|
DOI: | 10.48550/arxiv.2303.11166 |