ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning
Effective robot learning often requires online human feedback and interventions that can cost significant human time, giving rise to the central challenge in interactive imitation learning: is it possible to control the timing and length of interventions to both facilitate learning and limit burden...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Effective robot learning often requires online human feedback and
interventions that can cost significant human time, giving rise to the central
challenge in interactive imitation learning: is it possible to control the
timing and length of interventions to both facilitate learning and limit burden
on the human supervisor? This paper presents ThriftyDAgger, an algorithm for
actively querying a human supervisor given a desired budget of human
interventions. ThriftyDAgger uses a learned switching policy to solicit
interventions only at states that are sufficiently (1) novel, where the robot
policy has no reference behavior to imitate, or (2) risky, where the robot has
low confidence in task completion. To detect the latter, we introduce a novel
metric for estimating risk under the current robot policy. Experiments in
simulation and on a physical cable routing experiment suggest that
ThriftyDAgger's intervention criteria balances task performance and supervisor
burden more effectively than prior algorithms. ThriftyDAgger can also be
applied at execution time, where it achieves a 100% success rate on both the
simulation and physical tasks. A user study (N=10) in which users control a
three-robot fleet while also performing a concentration task suggests that
ThriftyDAgger increases human and robot performance by 58% and 80% respectively
compared to the next best algorithm while reducing supervisor burden. |
---|---|
DOI: | 10.48550/arxiv.2109.08273 |