Identifying Expert Behavior in Offline Training Datasets Improves Behavioral Cloning of Robotic Manipulation Policies
This letter presents our solution for the Real Robot Challenge III 1 , aiming to address dexterous robotic manipulation tasks through learning from offline data. In this competition, participants were given two types of datasets for each task: expert and mixed. Each expert dataset is collected by a...
Gespeichert in:
Veröffentlicht in: | IEEE robotics and automation letters 2024-02, Vol.9 (2), p.1294-1301 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This letter presents our solution for the Real Robot Challenge III 1 , aiming to address dexterous robotic manipulation tasks through learning from offline data. In this competition, participants were given two types of datasets for each task: expert and mixed. Each expert dataset is collected by a high-skill policy, whereas the mixed dataset is collected using both expert and non-expert policies. We found that the vanilla behavioural cloning (BC) can learn a very proficient policy with minimal human intervention when trained on expert datasets. Notably, BC outperformed even the most advanced offline reinforcement learning (RL) algorithms. However, when applied to mixed datasets, the performance of BC deteriorates; similarly, the performance of offline RL algorithms is also less than satisfactory. Upon examining the provided datasets, it was apparent that each mixed dataset contained a significant proportion of expert data, which should enable the training of a proficient BC agent. However, the expert data is not labelled in the datasets. As a result, we propose a classifier to identify the pattern of the expert behaviour within a mixed dataset and then utilize it to isolate the expert data. To further boost the BC performance, we take advantage of the geometric symmetry of the arena to augment the training dataset through mathematical transformations. Ultimately, our submission outperformed that of all other participants. |
---|---|
ISSN: | 2377-3766 2377-3766 |
DOI: | 10.1109/LRA.2023.3342559 |