NVR-Net: Normal Vector Guided Regression Network for Disentangled 6D Pose Estimation
Monocular 6D pose estimation for objects is an essential but challenging task that is commonly applied in computer vision and robotics. Existing two-stage methods solve for rotations with Perspective-n-Point (PnP), which still incorporates translations, resulting in accuracy degeneration. In contras...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2024-02, Vol.34 (2), p.1098-1113 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Monocular 6D pose estimation for objects is an essential but challenging task that is commonly applied in computer vision and robotics. Existing two-stage methods solve for rotations with Perspective-n-Point (PnP), which still incorporates translations, resulting in accuracy degeneration. In contrast, direct regression methods adopt Convolutional Neural Networks (CNNs) to solve for rotations and translations jointly but suffer from performance gaps in rotation accuracy. In this article, we propose a novel Normal Vector guided Regression Network (NVR-Net) to directly regress the 6D pose from a single RGB image under the guidance of 3D normal vectors. Specifically, we design a novel Orientation-Aware Feature (OAF) for pose estimation. It consists of two corresponding sets of 3D normal vectors to thoroughly disentangle rotation from translation estimation. Then, we introduce a CNN to predict a dense pixelwise representation of the OAF without viewpoint ambiguity. To estimate rotations and translations individually from the OAF, we propose a novel Pose from Normal Vectors (PNV) head networks under the instruction of a differentiable closed-form solution. Finally, extensive experiments on three common benchmarks demonstrate that our approach outperforms state-of-the-art methods on rotation accuracy and removes the gap between indirect and end-to-end methods. Moreover, our method can estimate the 6D pose of a single object within an RGB image in real-time. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2023.3290617 |