Generalizable Imitation Learning Through Pre-Trained Representations
In this paper we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abilities of imitation learning policies. We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-l...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper we leverage self-supervised vision transformer models and their
emergent semantic abilities to improve the generalization abilities of
imitation learning policies. We introduce BC-ViT, an imitation learning
algorithm that leverages rich DINO pre-trained Visual Transformer (ViT)
patch-level embeddings to obtain better generalization when learning through
demonstrations. Our learner sees the world by clustering appearance features
into semantic concepts, forming stable keypoints that generalize across a wide
range of appearance variations and object types. We show that this
representation enables generalized behaviour by evaluating imitation learning
across a diverse dataset of object manipulation tasks. Our method, data and
evaluation approach are made available to facilitate further study of
generalization in Imitation Learners. |
---|---|
DOI: | 10.48550/arxiv.2311.09350 |