Boundary-induced and scene-aggregated network for monocular depth prediction
•The problems of predicting the incorrect farthest region and the blurred depth around boundaries are deeply explored.•The Boundary-induced and scene-aggregated network is proposed to address the two issues above.•A well-designed DCE obtains the correlations between long-distance pixels and the corr...
Gespeichert in:
Veröffentlicht in: | Pattern recognition 2021-07, Vol.115, p.107901, Article 107901 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •The problems of predicting the incorrect farthest region and the blurred depth around boundaries are deeply explored.•The Boundary-induced and scene-aggregated network is proposed to address the two issues above.•A well-designed DCE obtains the correlations between long-distance pixels and the correlations between multi-scale regions.•To extract the depth boundary, a BUBF module is designed to gradually fuse features of adjacent levels.•A Stripe Refinement Module (SRM) is designed to refine depth around the boundary.•Numerous experiments on the NYUD v2, iBims-1, SUN-RGBD dataset prove the effectiveness of our method.
Monocular depth prediction is an important task in scene understanding. It aims to predict the dense depth of a single RGB image. With the development of deep learning, the performance of this task has made great improvements. However, two issues remain unresolved: (1) The deep feature encodes the wrong farthest region in a scene, which leads to a distorted 3D structure of the predicted depth; (2) The low-level features are insufficient utilized, which makes it even harder to estimate the depth near the edge with sudden depth change. To tackle these two issues, we propose the Boundary-induced and Scene-aggregated network (BS-Net). In this network, the Depth Correlation Encoder (DCE) is first designed to obtain the contextual correlations between the regions in an image, and perceive the farthest region by considering the correlations. Meanwhile, the Bottom-Up Boundary Fusion (BUBF) module is designed to extract accurate boundary that indicates depth change. Finally, the Stripe Refinement module (SRM) is designed to refine the dense depth induced by the boundary cue, which improves the boundary accuracy of the predicted depth. Several experimental results on the NYUD v2 dataset and the iBims-1 dataset illustrate the state-of-the-art performance of the proposed approach. And the SUN-RGBD dataset is employed to evaluate the generalization of our method. Code is available at https://github.com/XuefengBUPT/BS-Net. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2021.107901 |