Bayesian cue integration of structure from motion and CNN-based monocular depth estimation for autonomous robot navigation

Monocular depth estimation (MDE) provides information (from a single image) about overall scene layout, and is useful in robotics for autonomous navigation and vision-aided guidance. Advancements in deep learning, particularly self-supervised convolutional neural networks (CNNs), have led to the dev...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of intelligent robotics and applications Online 2022-06, Vol.6 (2), p.191-206
Hauptverfasser:	Mumuni, Fuseini, Mumuni, Alhassan
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Artificial Intelligence Artificial neural networks Autonomous navigation Bayesian analysis Cameras Computer Science Control Datasets Electronics and Microelectronics Incompatibility Information sources Instrumentation Localization Machine learning Machine vision Machines Manufacturing Mechatronics Optical flow (image analysis) Performance degradation Processes Regular Paper Robotics Robots Semantics Sensors Space perception Task complexity User Interfaces and Human Computer Interaction Vision systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Monocular depth estimation (MDE) provides information (from a single image) about overall scene layout, and is useful in robotics for autonomous navigation and vision-aided guidance. Advancements in deep learning, particularly self-supervised convolutional neural networks (CNNs), have led to the development of MDE models capable of providing highly accurate per-pixel depth maps. However, these models are typically tuned for specific datasets, leading to sharp performance degradation in real-world scenarios, particularly in robot vision tasks—where the natural environments are too varied and complex to be sufficiently described by standard datasets. Motivated by the approach of biological vision, whose immense success relies on optimal combination of multiple depth cues and knowledge about the underlying environments, we exploit structure from motion (SfM) through optical flow as an additional depth cue and prior knowledge about depth distribution in the environment to improve monocular depth prediction. Meanwhile, there is a general incompatibility between the outputs of these models—whereas SfM measures absolute distances, MDE is scale ambiguous, returning only depth ratios. Consequently, we show how it is possible to promote MDE cue from ordinal scale to the same metric scale as SfM, thus, enabling their optimal integration in a Bayesian optimal manner. Additionally, we generalize the relationship between camera tilt angles and resulting MDE distortions, and show how this can be used to further improve depth perception robustness and accuracy (up to 6.2%) for a mobile robot whose heading is subject to arbitrary angular inclinations.
ISSN:	2366-5971 2366-598X
DOI:	10.1007/s41315-022-00226-2