VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes
Robotic grasping faces new challenges in human-robot-interaction scenarios. We consider the task that the robot grasps a target object designated by human's language directives. The robot not only needs to locate a target based on vision-and-language information, but also needs to predict the r...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Robotic grasping faces new challenges in human-robot-interaction scenarios.
We consider the task that the robot grasps a target object designated by
human's language directives. The robot not only needs to locate a target based
on vision-and-language information, but also needs to predict the reasonable
grasp pose candidate at various views and postures. In this work, we propose a
novel interactive grasp policy, named Visual-Lingual-Grasp (VL-Grasp), to grasp
the target specified by human language. First, we build a new challenging
visual grounding dataset to provide functional training data for robotic
interactive perception in indoor environments. Second, we propose a 6-Dof
interactive grasp policy combined with visual grounding and 6-Dof grasp pose
detection to extend the universality of interactive grasping. Third, we design
a grasp pose filter module to enhance the performance of the policy.
Experiments demonstrate the effectiveness and extendibility of the VL-Grasp in
real world. The VL-Grasp achieves a success rate of 72.5\% in different indoor
scenes. The code and dataset is available at
https://github.com/luyh20/VL-Grasp. |
---|---|
DOI: | 10.48550/arxiv.2308.00640 |