RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task g...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The ability to leverage heterogeneous robotic experience from different
robots and tasks to quickly master novel skills and embodiments has the
potential to transform robot learning. Inspired by recent advances in
foundation models for vision and language, we propose a multi-embodiment,
multi-task generalist agent for robotic manipulation. This agent, named
RoboCat, is a visual goal-conditioned decision transformer capable of consuming
action-labelled visual experience. This data spans a large repertoire of motor
control skills from simulated and real robotic arms with varying sets of
observations and actions. With RoboCat, we demonstrate the ability to
generalise to new tasks and robots, both zero-shot as well as through
adaptation using only 100-1000 examples for the target task. We also show how a
trained model itself can be used to generate data for subsequent training
iterations, thus providing a basic building block for an autonomous improvement
loop. We investigate the agent's capabilities, with large-scale evaluations
both in simulation and on three different real robot embodiments. We find that
as we grow and diversify its training data, RoboCat not only shows signs of
cross-task transfer, but also becomes more efficient at adapting to new tasks. |
---|---|
DOI: | 10.48550/arxiv.2306.11706 |