Toward Robust LiDAR-Camera Fusion in BEV Space via Mutual Deformable Attention and Temporal Aggregation
LiDAR and camera are two critical sensors that can provide complementary information for accurate 3D object detection. Most works are devoted to improving the detection performance of fusion models on the clean and well-collected datasets. However, the collected point clouds and images in real scena...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2024-07, Vol.34 (7), p.5753-5764 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | LiDAR and camera are two critical sensors that can provide complementary information for accurate 3D object detection. Most works are devoted to improving the detection performance of fusion models on the clean and well-collected datasets. However, the collected point clouds and images in real scenarios may be corrupted to various degrees due to potential sensor malfunctions, which greatly affects the robustness of the fusion model and poses a threat to safe deployment. In this paper, we first analyze the shortcomings of most fusion detectors, which rely mainly on the LiDAR branch, and the potential of the bird's eye-view (BEV) paradigm in dealing with partial sensor failures. Based on that, we present a robust LiDAR-camera fusion pipeline in unified BEV space with two novel designs under four typical LiDAR-camera malfunction cases. Specifically, a mutual deformable attention is proposed to dynamically model the spatial feature relationship and reduce the interference caused by the corrupted modality, and a temporal aggregation module is devised to fully utilize the rich information in the temporal domain. Together with the decoupled feature extraction for each modality and holistic BEV space fusion, the proposed detector, termed RobBEV, can work stably regardless of single-modality data corruption. Extensive experiments on the large-scale nuScenes dataset under robust settings demonstrate the effectiveness of our approach. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2024.3366664 |