A concise but high-performing network for image guided depth completion in autonomous driving
Depth completion is a crucial task in autonomous driving, aiming to convert a sparse depth map into a dense depth prediction. The sparse depth map serves as a partial reference for the actual depth, and the fusion of RGB images is frequently employed to augment the completion process owing to its in...
Gespeichert in:
Veröffentlicht in: | Knowledge-based systems 2024-07, Vol.296, p.111877, Article 111877 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Depth completion is a crucial task in autonomous driving, aiming to convert a sparse depth map into a dense depth prediction. The sparse depth map serves as a partial reference for the actual depth, and the fusion of RGB images is frequently employed to augment the completion process owing to its inherent richness in semantic information. Image-guided depth completion confronts three principal challenges: (1) the effective fusion of the two modalities; (2) the enhancement of depth information recovery; and (3) the realization of real-time predictive capabilities requisite for practical autonomous driving scenarios. In response to these challenges, we propose a concise but high-performing network, named CHNet, to achieve high-performance depth completion with an elegant and straightforward architecture. Firstly, we use a fast guidance module to fuse the two sensor features, harnessing abundant auxiliary information derived from the color space. Unlike the prevalent complex guidance modules, our approach adopts an intuitive and cost-effective strategy. In addition, we find and analyze the optimization inconsistency problem for observed and unobserved positions. To mitigate this challenge, we introduce a decoupled depth prediction head, tailored to better discern and predict depth values for both valid and invalid positions, incurring minimal additional inference time. Capitalizing on the dual-encoder and single-decoder architecture, the simplicity of CHNet facilitates an optimal balance between accuracy and computational efficiency. In benchmark evaluations on the KITTI depth completion dataset, CHNet demonstrates competitive performance metrics and inference speeds relative to contemporary state-of-the-art methodologies. To assess the generalizability of our approach, we extend our evaluations to the indoor NYUv2 dataset, where CHNet continues to yield impressive outcomes. The code of this work will be available at https://github.com/lmomoy/CHNet. |
---|---|
ISSN: | 0950-7051 1872-7409 |
DOI: | 10.1016/j.knosys.2024.111877 |