A Task-Oriented Grasping Framework Guided by Visual Semantics for Mobile Manipulators

The densely cluttered operational environment and the absence of object information hinder mobile manipulators from achieving specific grasping tasks. To address this issue, this article proposes a task-oriented grasping framework guided by visual semantics for mobile manipulators. With multiple att...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on instrumentation and measurement 2024, Vol.73, p.1-13
Hauptverfasser:	Zhang, Guangzheng, Wang, Shuting, Xie, Yuanlong, Xie, Sheng Quan, Hu, Yiming, Xiong, Tifan
Format:	Artikel
Sprache:	eng
Schlagworte:	Absence of object information Artificial neural networks Computer networks Convolution Datasets deep learning Grasping Manipulators mobile manipulator Modules Robots Semantics Task analysis task-oriented robotic grasping visual semantics Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The densely cluttered operational environment and the absence of object information hinder mobile manipulators from achieving specific grasping tasks. To address this issue, this article proposes a task-oriented grasping framework guided by visual semantics for mobile manipulators. With multiple attention mechanisms, we first present a modified DeepLabV3+ model by replacing the backbone networks with MobileNetV2 and incorporating a novel attention feature fusion module (AFFM) to build a preprocessing module, thus producing semantic images efficiently and accurately. A semantic-guided viewpoint adjustment strategy is designed in which the semantic images are used to calculate the optimal viewpoint that enables the eye-in-hand installed camera to self-adjust until it encompasses all the objects within the task-related area. Based on the improved DeepLabV3+ model and the generative residual convolutional neural network, a task-oriented grasp detection structure is developed to generate a more precise grasp representation for the specific object in densely cluttered scenarios. The effectiveness of the proposed framework is validated through the dataset comparison tests and multiple sets of practical grasping experiments. The results demonstrate that our proposed method achieves competitive results versus the state-of-the-art (SOTA) methods, which attains an accuracy of 98.3% on the Cornell grasping dataset and achieves a grasping success rate of 91% in densely cluttered scenes.
ISSN:	0018-9456 1557-9662
DOI:	10.1109/TIM.2024.3381662