Text-driven object affordance for guiding grasp-type recognition in multimodal robot teaching

In robot teaching, the grasping strategies taught to robots by users are critical information, because these strategies contain the implicit knowledge necessary to successfully perform a series of manipulations; however, limited practical knowledge exists on how to utilize linguistic information for...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Machine vision and applications 2023-07, Vol.34 (4), p.58, Article 58
Hauptverfasser:	Wake, Naoki, Saito, Daichi, Sasabuchi, Kazuhiro, Koike, Hideki, Ikeuchi, Katsushi
Format:	Artikel
Sprache:	eng
Schlagworte:	Communications Engineering Computer Science Datasets Effectiveness Grasping (robotics) Image enhancement Image Processing and Computer Vision Mixed reality Networks Original Paper Pattern Recognition Robots Vision systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In robot teaching, the grasping strategies taught to robots by users are critical information, because these strategies contain the implicit knowledge necessary to successfully perform a series of manipulations; however, limited practical knowledge exists on how to utilize linguistic information for supporting grasp-type recognition in multimodal teaching. This study focused on the effects of text-driven object affordance—a prior distribution of grasp types for each object—on image-based grasp-type recognition. To this end, we created the datasets of first-person grasping-hand images labeled with grasp types and object names and tested if the object affordance enhanced the performance of image-based recognition. We evaluated two scenarios with real and illusory objects to be grasped, considering a teaching condition in mixed reality, where the lack of visual object information can make image-based recognition challenging. The results show that object affordance guided the image-based recognition in two scenarios, that is, increasing the recognition accuracy by (1) excluding the unlikely grasp types from the candidates and (2) enhancing the likely grasp types. Additionally, the “enhancing effect” was more pronounced with greater grasp-type bias for each object in a test dataset. These results indicate the effectiveness of object affordance for guiding grasp-type recognition in multimodal robot teaching applications. The contributions of this study are (1) demonstrating the effectiveness of object affordance in guiding grasp-type recognition both with and without the real objects in images, (2) demonstrating the conditions under which the merits of object affordance are pronounced, and (3) providing a dataset of first-person grasping images labeled with possible grasp types for each object.
ISSN:	0932-8092 1432-1769
DOI:	10.1007/s00138-023-01408-z