Real-time Depth Estimation Using Recurrent CNN with Sparse Depth Cues for SLAM System
Depth map has been utilized for refinement of geometric information in a variety of fields such as 3D reconstruction and pose estimation in SLAM system where ill-posed problems are occurred. Currently, as learning-based approaches are successfully introduced throughout many problems of vision-based...
Gespeichert in:
Veröffentlicht in: | International journal of control, automation, and systems 2020, Automation, and Systems, 18(1), , pp.206-216 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Depth map has been utilized for refinement of geometric information in a variety of fields such as 3D reconstruction and pose estimation in SLAM system where ill-posed problems are occurred. Currently, as learning-based approaches are successfully introduced throughout many problems of vision-based fields, several depth estimation algorithms based on CNN are suggested, which only conduct training of spatial information. Since an image sequence or video used for SLAM system tends to have temporal information, this paper proposes a recurrent CNN architecture for SLAM system to estimate depth map by exploring not only spatial but also temporal information by using convolutional GRU cell, which is constructed to remember weights of past convolutional layers. Furthermore, this paper proposes using additional layers that preserve structure of scenes by utilizing sparse depth cues obtained from SLAM system. The sparse depth cues are produced by projecting reconstructed 3D map into each camera frame, and the sparse cues help to predict accurate depth map avoiding ambiguity of depth map generation of untrained structures in latent space. Despite accuracy of depth cues according to monocular SLAM system degrades than stereo SLAM system, the proposed masking approach, which takes the confidence of depth cues with regard to a relative camera pose between current frame and previous frame, retains the performance of the proposed system with the proposed adaptive regularization in loss function. In the training phase, by preprocessing exponential quantization of ground-truth depth map to eliminate the ill-effects of the captured large distances, the depth map prediction of the proposed system improves more than other baseline methods with accomplishment of real-time system. We expect that this proposed system can be used in SLAM system to refine geometric information for more accurate 3D reconstruction and pose estimation, which are essential parts for robust navigation system of robots. |
---|---|
ISSN: | 1598-6446 2005-4092 |
DOI: | 10.1007/s12555-019-0350-8 |