Bayesian cue integration of structure from motion and CNN-based monocular depth estimation for autonomous robot navigation
Monocular depth estimation (MDE) provides information (from a single image) about overall scene layout, and is useful in robotics for autonomous navigation and vision-aided guidance. Advancements in deep learning, particularly self-supervised convolutional neural networks (CNNs), have led to the dev...
Gespeichert in:
Veröffentlicht in: | International journal of intelligent robotics and applications Online 2022-06, Vol.6 (2), p.191-206 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Monocular depth estimation (MDE) provides information (from a single image) about overall scene layout, and is useful in robotics for autonomous navigation and vision-aided guidance. Advancements in deep learning, particularly self-supervised convolutional neural networks (CNNs), have led to the development of MDE models capable of providing highly accurate per-pixel depth maps. However, these models are typically tuned for specific datasets, leading to sharp performance degradation in real-world scenarios, particularly in robot vision tasks—where the natural environments are too varied and complex to be sufficiently described by standard datasets. Motivated by the approach of biological vision, whose immense success relies on optimal combination of multiple depth cues and knowledge about the underlying environments, we exploit structure from motion (SfM) through optical flow as an additional depth cue and prior knowledge about depth distribution in the environment to improve monocular depth prediction. Meanwhile, there is a general incompatibility between the outputs of these models—whereas SfM measures absolute distances, MDE is scale ambiguous, returning only depth ratios. Consequently, we show how it is possible to
promote
MDE cue from ordinal scale to the same metric scale as SfM, thus, enabling their optimal integration in a Bayesian optimal manner. Additionally, we generalize the relationship between camera tilt angles and resulting MDE distortions, and show how this can be used to further improve depth perception robustness and accuracy (up to 6.2%) for a mobile robot whose heading is subject to arbitrary angular inclinations. |
---|---|
ISSN: | 2366-5971 2366-598X |
DOI: | 10.1007/s41315-022-00226-2 |