Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories
Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic i...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Imitation learning (IL) is a frequently used approach for data-efficient
policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat
challenges like distributional shift by interacting with oracular experts.
Unfortunately, assuming access to oracular experts is often unrealistic in
practice; data used in IL frequently comes from offline processes such as
lead-through or teleoperation. In this paper, we present a novel imitation
learning technique called Collocation for Demonstration Encoding (CoDE) that
operates on only a fixed set of trajectory demonstrations. We circumvent
challenges with methods like back-propagation-through-time by introducing an
auxiliary trajectory network, which takes inspiration from collocation
techniques in optimal control. Our method generalizes well and more accurately
reproduces the demonstrated behavior with fewer guiding trajectories when
compared to standard behavioral cloning methods. We present simulation results
on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit
lifting, target-reaching, and obstacle avoidance behaviors. |
---|---|
DOI: | 10.48550/arxiv.2105.03019 |