Depth Estimation From Surface-Ground Correspondence for Monocular 3D Object Detection

Monocular 3D object detection has attracted great attention due to simplicity and low cost. However, object location recovery in the 3D space from a monocular image is challenging since the depth information is lost. How to estimate the instance depth is the core problem to be solved. Intuitively, t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on intelligent transportation systems 2024-11, Vol.25 (11), p.16312-16322
Hauptverfasser:	Ji, Yinshuai, Xu, Jinhua
Format:	Artikel
Sprache:	eng
Schlagworte:	automatic driving depth estimation Estimation Feature extraction ground depth Head Monocular 3D object detection Object detection Task analysis Three-dimensional displays Uncertainty
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Monocular 3D object detection has attracted great attention due to simplicity and low cost. However, object location recovery in the 3D space from a monocular image is challenging since the depth information is lost. How to estimate the instance depth is the core problem to be solved. Intuitively, the ground depth is continuous and global in essence, independent of the objects in the scene. Therefore the ground depth estimation can be more accurate and easier than the object depth estimation. Inspired by this, we propose to map a set of surface points of an object onto the ground plane and decompose the object depth solving problem into the ground depth estimation and surface point heights estimation. During the training stage, dense ground depth labels are provided by the ground truth (GT) surface depths of objects from LiDAR data. In the inference stage, surface depths are recovered through querying the ground depth map. As a result, a set of instance depth candidates are obtained and the final instance depth can be assembled according to their uncertainties. In addition, since most of the mapped ground points are occluded by the object which may mislead the network learning, we devise a depth expansion strategy to extend the ground depth labels. The proposed method MonoSGC achieves state-of-the-art (SOTA) performance on the KITTI and Waymo datasets. Ablation studies demonstrate the effectiveness of the proposed components. The code and model are released at https://github.com/JiYinshuai/MonoSGC .
ISSN:	1524-9050 1558-0016
DOI:	10.1109/TITS.2024.3411159