World Model as a Graph: Learning Latent Landmarks for Planning
International Conference on Machine Learning (ICML). 2021 Planning - the ability to analyze the structure of a problem in the large and decompose it into interrelated subproblems - is a hallmark of human intelligence. While deep reinforcement learning (RL) has shown great promise for solving relativ...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | International Conference on Machine Learning (ICML). 2021 Planning - the ability to analyze the structure of a problem in the large and
decompose it into interrelated subproblems - is a hallmark of human
intelligence. While deep reinforcement learning (RL) has shown great promise
for solving relatively straightforward control tasks, it remains an open
problem how to best incorporate planning into existing deep RL paradigms to
handle increasingly complex environments. One prominent framework, Model-Based
RL, learns a world model and plans using step-by-step virtual rollouts. This
type of world model quickly diverges from reality when the planning horizon
increases, thus struggling at long-horizon planning. How can we learn world
models that endow agents with the ability to do temporally extended reasoning?
In this work, we propose to learn graph-structured world models composed of
sparse, multi-step transitions. We devise a novel algorithm to learn latent
landmarks that are scattered (in terms of reachability) across the goal space
as the nodes on the graph. In this same graph, the edges are the reachability
estimates distilled from Q-functions. On a variety of high-dimensional
continuous control tasks ranging from robotic manipulation to navigation, we
demonstrate that our method, named L3P, significantly outperforms prior work,
and is oftentimes the only method capable of leveraging both the robustness of
model-free RL and generalization of graph-search algorithms. We believe our
work is an important step towards scalable planning in reinforcement learning. |
---|---|
DOI: | 10.48550/arxiv.2011.12491 |