Task-Relevant Adversarial Imitation Learning
We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels. When the discriminator focuses on task-irrelevant features, it does not provide an informative reward signal, leading to...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We show that a critical vulnerability in adversarial imitation is the
tendency of discriminator networks to learn spurious associations between
visual features and expert labels. When the discriminator focuses on
task-irrelevant features, it does not provide an informative reward signal,
leading to poor task performance. We analyze this problem in detail and propose
a solution that outperforms standard Generative Adversarial Imitation Learning
(GAIL). Our proposed method, Task-Relevant Adversarial Imitation Learning
(TRAIL), uses constrained discriminator optimization to learn informative
rewards. In comprehensive experiments, we show that TRAIL can solve challenging
robotic manipulation tasks from pixels by imitating human operators without
access to any task rewards, and clearly outperforms comparable baseline
imitation agents, including those trained via behaviour cloning and
conventional GAIL. |
---|---|
DOI: | 10.48550/arxiv.1910.01077 |