Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning

Deep Reinforcement Learning (RL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. To alleviate this issue, we propose to exploit the symmetries present in robotic tasks. Intuitively, symmetries from observe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2020-10, Vol.5 (4), p.1-1
Hauptverfasser:	Lin, Yijiong, Huang, Jiancong, Zimmer, Matthieu, Guan, Yisheng, Rojas, Juan, Weng, Paul
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive control AI-Based Methods Data augmentation Deep Learning Dexterous Manipulation Grippers Invariants Machine learning Pick and place tasks Reinforcement Learning Robot control Robotics Robots Task analysis Training Trajectory Transforms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Deep Reinforcement Learning (RL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. To alleviate this issue, we propose to exploit the symmetries present in robotic tasks. Intuitively, symmetries from observed trajectories define transformations that leave the space of feasible RL trajectories invariant and can be used to generate new feasible trajectories, which could be used for training. Based on this data augmentation idea, we formulate a general framework, called Invariant Transform Experience Replay that we present with two techniques: (i) Kaleidoscope Experience Replay exploits reflectional symmetries and (ii) Goal-augmented Experience Replay which takes advantage of lax goal definitions. In the Fetch tasks from OpenAI Gym, our experimental results show significant increases in learning rates and success rates. Particularly, we attain a 13, 3, and 5 times speedup in the pushing, sliding, and pick-and-place tasks respectively in the multi-goal setting. Performance gains are also observed in similar tasks with obstacles and we successfully deployed a trained policy on a real Baxter robot. Our work demonstrates that invariant transformations on RL trajectories are a promising methodology to speedup learning in deep RL.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2020.3013937