Rotation adaptive grasping estimation network oriented to unknown objects based on novel RGB-D fusion strategy

Accurate grasping estimation is prerequisite and key to achieving accurate robotic grasping. As common data sources, existing RGB and Depth (RGB-D) fusion strategies hardly fully use the advantages and suppress the disadvantages of both modes. In addition, existing methods mainly rely on data augmen...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering applications of artificial intelligence 2023-04, Vol.120, p.105842, Article 105842
Hauptverfasser: Tian, Hongkun, Song, Kechen, Li, Song, Ma, Shuai, Yan, Yunhui
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Accurate grasping estimation is prerequisite and key to achieving accurate robotic grasping. As common data sources, existing RGB and Depth (RGB-D) fusion strategies hardly fully use the advantages and suppress the disadvantages of both modes. In addition, existing methods mainly rely on data augmentation to achieve spatial and rotation adaptation, which cannot fundamentally solve the problem. Therefore, this paper proposes a framework for rotation adaptive grasping estimation based on a novel RGB-D fusion strategy. Specifically, the RGB-D is fused with shared weights in stages based on the proposed Multi-step Weight-learning Fusion (MWF) strategy. The spatial position is encoding learned autonomously based on the proposed Rotation Adaptive Conjoin (RAC) encoder to achieve spatial and rotational adaptiveness oriented to unknown objects with unknown poses. In addition, the Multi-dimensional Interaction-guided Attention (MIA) decoding strategy based on the fused multiscale features is proposed to highlight the practical elements and suppress the invalid ones. The method has been validated on the Cornell and Jacquard grasping datasets with cross-validation accuracies of 99.3% and 94.6%. The single-object and multi-object scene grasping success rates on the robot platform are 95.625% and 87.5%, respectively. Our performance compares favorably with state-of-the-art methods. •Proposed RAC encoder achieves robot adaptivity to unknown poses of objects.•Proposed MIA decoder achieves robustness of the robot to grasp unknown objects.•Proposed MWF strategy achieves full utilization of RGB-D multimodal information.•The proposed network achieves a high accuracy rate for grasping unknown objects.•The proposed method is doubly validated on public datasets and real robot system.
ISSN:0952-1976
1873-6769
DOI:10.1016/j.engappai.2023.105842