Fast Monocular Depth Estimation via Side Prediction Aggregation with Continuous Spatial Refinement

Recent works have validated the benefit of integrating spatial information into deep networks to improve pixel-level prediction tasks such as monocular depth estimation. However, how to efficiently and robustly integrate spatial cues retains as an open problem. In this paper, we introduce the Side P...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2023, Vol.25, p.1204-1216
Hauptverfasser: Wu, Jipeng, Ji, Rongrong, Wang, Qiang, Zhang, Shengchuan, Sun, Xiaoshuai, Wang, Yan, Xu, Mingliang, Huang, Feiyue
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recent works have validated the benefit of integrating spatial information into deep networks to improve pixel-level prediction tasks such as monocular depth estimation. However, how to efficiently and robustly integrate spatial cues retains as an open problem. In this paper, we introduce the Side Prediction Aggregation (termed SPA) method to enhance the embedding of scene structural information from low-level to high-level layers. To improve the estimation accuracy, the proposed method is further equipped with continuous Spatial Refinement Loss (termed SRL) at multiple resolutions with negligible extra computation. Besides, the proposed sequential network can further perform adversarial learning at multiple resolutions. Such an adversarial refinement strategy greatly improves the accuracy of estimated depth with a little extra computation. Without using any pre-trained models, our network achieves the the-state-of-art accuracy on KITTI, NYUD V2, and Cityscapes datasets, which has achieved real-time depth estimation online.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2021.3140001