CroMo: Cross-Modal Learning for Monocular Depth Estimation
Learning-based depth estimation has witnessed recent progress in multiple directions; from self-supervision using monocular video to supervised methods offering highest accuracy. Complementary to supervision, further boosts to performance and robustness are gained by combining information from multi...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Learning-based depth estimation has witnessed recent progress in multiple
directions; from self-supervision using monocular video to supervised methods
offering highest accuracy. Complementary to supervision, further boosts to
performance and robustness are gained by combining information from multiple
signals. In this paper we systematically investigate key trade-offs associated
with sensor and modality design choices as well as related model training
strategies. Our study leads us to a new method, capable of connecting
modality-specific advantages from polarisation, Time-of-Flight and
structured-light inputs. We propose a novel pipeline capable of estimating
depth from monocular polarisation for which we evaluate various training
signals. The inversion of differentiable analytic models thereby connects scene
geometry with polarisation and ToF signals and enables self-supervised and
cross-modal learning. In the absence of existing multimodal datasets, we
examine our approach with a custom-made multi-modal camera rig and collect
CroMo; the first dataset to consist of synchronized stereo polarisation,
indirect ToF and structured-light depth, captured at video rates. Extensive
experiments on challenging video scenes confirm both qualitative and
quantitative pipeline advantages where we are able to outperform competitive
monocular depth estimation method. |
---|---|
DOI: | 10.48550/arxiv.2203.12485 |