ACNMP: Skill Transfer and Task Extrapolation through Learning from Demonstration and Reinforcement Learning via Representation Sharing
To equip robots with dexterous skills, an effective approach is to first transfer the desired skill via Learning from Demonstration (LfD), then let the robot improve it by self-exploration via Reinforcement Learning (RL). In this paper, we propose a novel LfD+RL framework, namely Adaptive Conditiona...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | To equip robots with dexterous skills, an effective approach is to first
transfer the desired skill via Learning from Demonstration (LfD), then let the
robot improve it by self-exploration via Reinforcement Learning (RL). In this
paper, we propose a novel LfD+RL framework, namely Adaptive Conditional Neural
Movement Primitives (ACNMP), that allows efficient policy improvement in novel
environments and effective skill transfer between different agents. This is
achieved through exploiting the latent representation learned by the underlying
Conditional Neural Process (CNP) model, and simultaneous training of the model
with supervised learning (SL) for acquiring the demonstrated trajectories and
via RL for new trajectory discovery. Through simulation experiments, we show
that (i) ACNMP enables the system to extrapolate to situations where pure LfD
fails; (ii) Simultaneous training of the system through SL and RL preserves the
shape of demonstrations while adapting to novel situations due to the shared
representations used by both learners; (iii) ACNMP enables order-of-magnitude
sample-efficient RL in extrapolation of reaching tasks compared to the existing
approaches; (iv) ACNMPs can be used to implement skill transfer between robots
having different morphology, with competitive learning speeds and importantly
with less number of assumptions compared to the state-of-the-art approaches.
Finally, we show the real-world suitability of ACNMPs through real robot
experiments that involve obstacle avoidance, pick and place and pouring
actions. |
---|---|
DOI: | 10.48550/arxiv.2003.11334 |