G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
Recent advances in imitation learning for 3D robotic manipulation have shown promising results with diffusion-based policies. However, achieving human-level dexterity requires seamless integration of geometric precision and semantic understanding. We present G3Flow, a novel framework that constructs...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent advances in imitation learning for 3D robotic manipulation have shown
promising results with diffusion-based policies. However, achieving human-level
dexterity requires seamless integration of geometric precision and semantic
understanding. We present G3Flow, a novel framework that constructs real-time
semantic flow, a dynamic, object-centric 3D semantic representation by
leveraging foundation models. Our approach uniquely combines 3D generative
models for digital twin creation, vision foundation models for semantic feature
extraction, and robust pose tracking for continuous semantic flow updates. This
integration enables complete semantic understanding even under occlusions while
eliminating manual annotation requirements. By incorporating semantic flow into
diffusion policies, we demonstrate significant improvements in both
terminal-constrained manipulation and cross-object generalization. Extensive
experiments across five simulation tasks show that G3Flow consistently
outperforms existing approaches, achieving up to 68.3% and 50.1% average
success rates on terminal-constrained manipulation and cross-object
generalization tasks respectively. Our results demonstrate the effectiveness of
G3Flow in enhancing real-time dynamic semantic feature understanding for
robotic manipulation policies. |
---|---|
DOI: | 10.48550/arxiv.2411.18369 |