Extending Policy from One-Shot Learning through Coaching
Humans generally teach their fellow collaborators to perform tasks through a small number of demonstrations. The learnt task is corrected or extended to meet specific task goals by means of coaching. Adopting a similar framework for teaching robots through demonstrations and coaching makes teaching...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Humans generally teach their fellow collaborators to perform tasks through a
small number of demonstrations. The learnt task is corrected or extended to
meet specific task goals by means of coaching. Adopting a similar framework for
teaching robots through demonstrations and coaching makes teaching tasks highly
intuitive. Unlike traditional Learning from Demonstration (LfD) approaches
which require multiple demonstrations, we present a one-shot learning from
demonstration approach to learn tasks. The learnt task is corrected and
generalized using two layers of evaluation/modification. First, the robot
self-evaluates its performance and corrects the performance to be closer to the
demonstrated task. Then, coaching is used as a means to extend the policy
learnt to be adaptable to varying task goals. Both the self-evaluation and
coaching are implemented using reinforcement learning (RL) methods. Coaching is
achieved through human feedback on desired goal and action modification to
generalize to specified task goals. The proposed approach is evaluated with a
scooping task, by presenting a single demonstration. The self-evaluation
framework aims to reduce the resistance to scooping in the media. To reduce the
search space for RL, we bootstrap the search using least resistance path
obtained using resistive force theory. Coaching is used to generalize the
learnt task policy to transfer the desired quantity of material. Thus, the
proposed method provides a framework for learning tasks from one demonstration
and generalizing it using human feedback through coaching. |
---|---|
DOI: | 10.48550/arxiv.1905.04841 |