6D Object Pose Estimation with Compact Generalized Non-local Operation

Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object's 3D bou...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024-11, Vol.12, p.1-1
Hauptverfasser:	Jiang, Changhong, Mu, Xiaoqiao, Zhang, Bingbing, Liang, Chao, Xie, Mujun
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Computational modeling Correlation Correlations End-to-end Feature extraction Fine-grained Details Long-range Spatiotemporal Pose estimation Predictive models Representational Power Solid modeling Subtle Feature Three-dimensional displays Training YOLO
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object's 3D bounding box vertices onto a 2D image, facilitating the estimation of the object's 6D pose. The network is constructed using the YOLOv5 model, with the integration of an improved non-local module termed the Compact Generalized Non-local Block. This module enhances feature representation by learning the correlations between the positions of all elements across channels, effectively capturing subtle feature cues. The proposed network is end-to-end trainable, producing accurate pose predictions without the need for any post-processing operations. Extensive validation on the LineMod dataset shows that our approach achieves a final accuracy of 46.1% on the average 3D distance of model vertices (ADD) metric, outperforming existing methods by 6.9% and our baseline model by 1.8%, thus underscoring the efficacy of the proposed network.
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3508772