Identifying Expert Behavior in Offline Training Datasets Improves Behavioral Cloning of Robotic Manipulation Policies

This letter presents our solution for the Real Robot Challenge III 1 , aiming to address dexterous robotic manipulation tasks through learning from offline data. In this competition, participants were given two types of datasets for each task: expert and mixed. Each expert dataset is collected by a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2024-02, Vol.9 (2), p.1294-1301
Hauptverfasser:	Wang, Qiang, McCarthy, Robert, Bulens, David Cordova, Sanchez, Francisco Roldan, McGuinness, Kevin, O'Connor, Noel E., Redmond, Stephen J.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Behavioral sciences Classification algorithms Cloning Data models Data sets for robot learning Datasets Imitation learning Machine learning Policies Reinforcement learning Robot kinematics Robot learning Task analysis Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This letter presents our solution for the Real Robot Challenge III 1 , aiming to address dexterous robotic manipulation tasks through learning from offline data. In this competition, participants were given two types of datasets for each task: expert and mixed. Each expert dataset is collected by a high-skill policy, whereas the mixed dataset is collected using both expert and non-expert policies. We found that the vanilla behavioural cloning (BC) can learn a very proficient policy with minimal human intervention when trained on expert datasets. Notably, BC outperformed even the most advanced offline reinforcement learning (RL) algorithms. However, when applied to mixed datasets, the performance of BC deteriorates; similarly, the performance of offline RL algorithms is also less than satisfactory. Upon examining the provided datasets, it was apparent that each mixed dataset contained a significant proportion of expert data, which should enable the training of a proficient BC agent. However, the expert data is not labelled in the datasets. As a result, we propose a classifier to identify the pattern of the expert behaviour within a mixed dataset and then utilize it to isolate the expert data. To further boost the BC performance, we take advantage of the geometric symmetry of the arena to augment the training dataset through mathematical transformations. Ultimately, our submission outperformed that of all other participants.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2023.3342559