3D Hierarchical Refinement and Augmentation for Unsupervised Learning of Depth and Pose from Monocular Video
Depth and ego-motion estimations are essential for the localization and navigation of autonomous robots and autonomous driving. Recent studies make it possible to learn the per-pixel depth and ego-motion from the unlabeled monocular video. A novel unsupervised training framework is proposed with 3D...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Depth and ego-motion estimations are essential for the localization and
navigation of autonomous robots and autonomous driving. Recent studies make it
possible to learn the per-pixel depth and ego-motion from the unlabeled
monocular video. A novel unsupervised training framework is proposed with 3D
hierarchical refinement and augmentation using explicit 3D geometry. In this
framework, the depth and pose estimations are hierarchically and mutually
coupled to refine the estimated pose layer by layer. The intermediate view
image is proposed and synthesized by warping the pixels in an image with the
estimated depth and coarse pose. Then, the residual pose transformation can be
estimated from the new view image and the image of the adjacent frame to refine
the coarse pose. The iterative refinement is implemented in a differentiable
manner in this paper, making the whole framework optimized uniformly.
Meanwhile, a new image augmentation method is proposed for the pose estimation
by synthesizing a new view image, which creatively augments the pose in 3D
space but gets a new augmented 2D image. The experiments on KITTI demonstrate
that our depth estimation achieves state-of-the-art performance and even
surpasses recent approaches that utilize other auxiliary tasks. Our visual
odometry outperforms all recent unsupervised monocular learning-based methods
and achieves competitive performance to the geometry-based method, ORB-SLAM2
with back-end optimization. |
---|---|
DOI: | 10.48550/arxiv.2112.03045 |