Lightweight Language-driven Grasp Detection using Conditional Consistency Model
Language-driven grasp detection is a fundamental yet challenging task in robotics with various industrial applications. In this work, we present a new approach for language-driven grasp detection that leverages the concept of lightweight diffusion models to achieve fast inference time. By integratin...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Language-driven grasp detection is a fundamental yet challenging task in
robotics with various industrial applications. In this work, we present a new
approach for language-driven grasp detection that leverages the concept of
lightweight diffusion models to achieve fast inference time. By integrating
diffusion processes with grasping prompts in natural language, our method can
effectively encode visual and textual information, enabling more accurate and
versatile grasp positioning that aligns well with the text query. To overcome
the long inference time problem in diffusion models, we leverage the image and
text features as the condition in the consistency model to reduce the number of
denoising timesteps during inference. The intensive experimental results show
that our method outperforms other recent grasp detection methods and
lightweight diffusion models by a clear margin. We further validate our method
in real-world robotic experiments to demonstrate its fast inference time
capability. |
---|---|
DOI: | 10.48550/arxiv.2407.17967 |