InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation
Bimanual manipulation presents unique challenges compared to unimanual tasks due to the complexity of coordinating two robotic arms. In this paper, we introduce InterACT: Inter-dependency aware Action Chunking with Hierarchical Attention Transformers, a novel imitation learning framework designed sp...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Bimanual manipulation presents unique challenges compared to unimanual tasks
due to the complexity of coordinating two robotic arms. In this paper, we
introduce InterACT: Inter-dependency aware Action Chunking with Hierarchical
Attention Transformers, a novel imitation learning framework designed
specifically for bimanual manipulation. InterACT leverages hierarchical
attention mechanisms to effectively capture inter-dependencies between dual-arm
joint states and visual inputs. The framework comprises a Hierarchical
Attention Encoder, which processes multi-modal inputs through segment-wise and
cross-segment attention mechanisms, and a Multi-arm Decoder that generates each
arm's action predictions in parallel, while sharing information between the
arms through synchronization blocks by providing the other arm's intermediate
output as context. Our experiments, conducted on various simulated and
real-world bimanual manipulation tasks, demonstrate that InterACT outperforms
existing methods. Detailed ablation studies further validate the significance
of key components, including the impact of CLS tokens, cross-segment encoders,
and synchronization blocks on task performance. We provide supplementary
materials and videos on our project page. |
---|---|
DOI: | 10.48550/arxiv.2409.07914 |