DITTO: Offline Imitation Learning with World Models
We propose DITTO, an offline imitation learning algorithm which uses world models and on-policy reinforcement learning to addresses the problem of covariate shift, without access to an oracle or any additional online interactions. We discuss how world models enable offline, on-policy imitation learn...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose DITTO, an offline imitation learning algorithm which uses world
models and on-policy reinforcement learning to addresses the problem of
covariate shift, without access to an oracle or any additional online
interactions. We discuss how world models enable offline, on-policy imitation
learning, and propose a simple intrinsic reward defined in the world model
latent space that induces imitation learning by reinforcement learning.
Theoretically, we show that our formulation induces a divergence bound between
expert and learner, in turn bounding the difference in reward. We test our
method on difficult Atari environments from pixels alone, and achieve
state-of-the-art performance in the offline setting. |
---|---|
DOI: | 10.48550/arxiv.2302.03086 |