SS-Pose: Self-Supervised 6-D Object Pose Representation Learning Without Rendering

Object pose estimation has extensive applications in various industrial scenarios. However, the heavy reliance on dense 6-D annotation and textured object models has become a significant obstacle to the widespread industrial application of 6-D object pose estimation methods. In this work, we present...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on industrial informatics 2024-01, Vol.20 (12), p.13665-13675
Hauptverfasser:	Mu, Fengjun, Huang, Rui, Zhang, Jingting, Zou, Chaobin, Shi, Kecheng, Sun, Shixiang, Zhan, Huayi, Zhao, Pengbo, Qiu, Jing, Cheng, Hong
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Constraints Coordinate transformations Datasets Estimation Industrial applications Industrial perception object pose estimation Pose estimation Rendering (computer graphics) Representation learning Representations Resolvers Self-supervised learning Solid modeling Training Visual perception
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Object pose estimation has extensive applications in various industrial scenarios. However, the heavy reliance on dense 6-D annotation and textured object models has become a significant obstacle to the widespread industrial application of 6-D object pose estimation methods. In this work, we present SS-Pose , a self-supervised learning framework for estimating 6-D object poses without annotated 6-D data and textured model. SS-Pose proposes the coordinate system datum reinitializer stage to dynamically establish a sequence-level pose representation datum, and the temporal-spatial constraint resolver module to obtain the self-supervised learning target through interframe constraints. We introduce a one-shot cross-coordinate transformation that establishes the relationship between the 6-D representation and the object poses, which can be further utilized in real-world tasks. We evaluated the proposed SS-Pose on the challenging YCB-Video dataset and texture-less T-LESS dataset. Our approach achieves competitive performance with significantly lower data dependency, making it suitable for visual perception in industrial applications.
ISSN:	1551-3203 1941-0050
DOI:	10.1109/TII.2024.3424591