BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for Multi-View BEV 3D Object Detection

Recently, the Bird's-Eye-View (BEV) representation has gained increasing attention in multi-view 3D object detection, demonstrating promising applications in autonomous driving. Although multi-view camera-based systems can be deployed at a low cost, high-performance multi-view BEV object detect...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on intelligent vehicles 2024-01, Vol.9 (1), p.2489-2498
Hauptverfasser:	Li, Jianing, Lu, Ming, Liu, Jiaming, Guo, Yandong, Du, Yuan, Du, Li, Zhang, Shanghang
Format:	Artikel
Sprache:	eng
Schlagworte:	Autonomous driving Cameras Distillation Estimation Feature extraction Image classification Knowledge management Laser radar Lidar Object detection Object recognition Solid modeling Teachers Three-dimensional displays transfer learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recently, the Bird's-Eye-View (BEV) representation has gained increasing attention in multi-view 3D object detection, demonstrating promising applications in autonomous driving. Although multi-view camera-based systems can be deployed at a low cost, high-performance multi-view BEV object detectors still require significant computational resources. Knowledge Distillation (KD) is one of the most practical techniques to train smaller yet accurate models. Different from image classification tasks, BEV 3D object detection approaches are more complicated and consist of several components. Therefore, in this article, we propose a unified framework named BEV-LGKD to transfer knowledge in a teacher-student manner. However, directly applying the teacher-student paradigm to BEV features fails to achieve satisfying results due to heavy background information in RGB cameras. To solve this problem, we propose to leverage the localization advantage of LiDAR points. Specifically, we transform the LiDAR points into BEV space and generate the view-dependent foreground masks for the teacher-student paradigm. It is noted that our method only uses LiDAR points to guide the KD between RGB models. As the quality of depth estimation is crucial for BEV perception, we further introduce depth distillation to our framework. We have conducted comprehensive experiments on nuScenes dataset, bringing a maximum improvement of +3.5% mAP and +6.2% NDS for the student model.
ISSN:	2379-8858 2379-8904
DOI:	10.1109/TIV.2023.3319430