RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes
Unsupervised methods have showed promising results on monocular depth estimation. However, the training data must be captured in scenes without moving objects. To push the envelope of accuracy, recent methods tend to increase their model parameters. In this paper, an unsupervised learning framework...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Unsupervised methods have showed promising results on monocular depth
estimation. However, the training data must be captured in scenes without
moving objects. To push the envelope of accuracy, recent methods tend to
increase their model parameters. In this paper, an unsupervised learning
framework is proposed to jointly predict monocular depth and complete 3D motion
including the motions of moving objects and camera. (1) Recurrent modulation
units are used to adaptively and iteratively fuse encoder and decoder features.
This not only improves the single-image depth inference but also does not
overspend model parameters. (2) Instead of using a single set of filters for
upsampling, multiple sets of filters are devised for the residual upsampling.
This facilitates the learning of edge-preserving filters and leads to the
improved performance. (3) A warping-based network is used to estimate a motion
field of moving objects without using semantic priors. This breaks down the
requirement of scene rigidity and allows to use general videos for the
unsupervised learning. The motion field is further regularized by an
outlier-aware training loss. Despite the depth model just uses a single image
in test time and 2.97M parameters, it achieves state-of-the-art results on the
KITTI and Cityscapes benchmarks. |
---|---|
DOI: | 10.48550/arxiv.2303.04456 |