Off-policy adversarial imitation learning for robotic tasks with low-quality demonstrations
The goal of imitation learning (IL) is to enable the robot to imitate expert behavior given expert demonstrations. Adversarial imitation learning (AIL) is a recent successful IL architecture that has shown significant progress in complex continuous tasks, particularly robotic tasks. However, in most...
Gespeichert in:
Veröffentlicht in: | Applied soft computing 2020-12, Vol.97, p.106795, Article 106795 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The goal of imitation learning (IL) is to enable the robot to imitate expert behavior given expert demonstrations. Adversarial imitation learning (AIL) is a recent successful IL architecture that has shown significant progress in complex continuous tasks, particularly robotic tasks. However, in most cases, the acquisition of high-quality demonstrations is costly and laborious, which poses a significant challenge for AILs. Although generative adversarial imitation learning (GAIL) and its extensions have shown that they are robust to sub-optimal experts, it is difficult for them to surpass the performance of experts by a large margin. To address this issue, in this paper, we propose a novel off-policy AIL method called robust adversarial imitation learning (RAIL). To enable the agent to significantly outperform a sub-optimal expert providing demonstrations, the hindsight idea of variable reward (VR) is first incorporated into the off-policy AIL framework. Then, a strategy called hindsight copy (HC) of demonstrations is designed to provide the discriminator and trained policy in the AIL framework with different demonstrations to maximize the use of such demonstrations and speed up the learning. Experiments were conducted on two multi-goal robotic tasks to test the proposed method. The results show that our method is not limited to the quality of expert demonstrations and can outperform other IL approaches.
•An off-policy actor-critic architecture is used in the Adversarial imitation learning (AIL).•The hindsight idea of variable reward (VR) is incorporated into our off-policy AIL framework.•The strategy of hindsight copy (HC) is designed for sampling demonstrations.•The convergence analysis of the proposed method is provided. |
---|---|
ISSN: | 1568-4946 1872-9681 |
DOI: | 10.1016/j.asoc.2020.106795 |