Occlusion-guided multi-modal fusion for vehicle-infrastructure cooperative 3D object detection

In autonomous driving, leveraging sensor data (e.g. camera, LiDAR data) from both the vehicle and the infrastructure significantly improves perception capabilities. However, this integration traditionally results in increased demands on communication bandwidth. To address these challenges, we introd...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition 2025-01, Vol.157, p.110939, Article 110939
Hauptverfasser:	Chu, Huazhen, Liu, Haizhuang, Zhuo, Junbao, Chen, Jiansheng, Ma, Huimin
Format:	Artikel
Sprache:	eng
Schlagworte:	3D object detection Autonomous driving Occlusion Vehicle-infrastructure cooperative Vehicle-vehicle cooperative
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In autonomous driving, leveraging sensor data (e.g. camera, LiDAR data) from both the vehicle and the infrastructure significantly improves perception capabilities. However, this integration traditionally results in increased demands on communication bandwidth. To address these challenges, we introduce Fusion2comm, an occlusion-guided feature fusion approach designed to optimize vehicle-infrastructure cooperative 3D object detection. Our innovative strategy employs an intelligent fusion of camera and LiDAR data to enhance the expressiveness of features. Subsequently, it leverages a segmentation model to extract foreground features and utilizes an occlusion-based selection of communication content, effectively easing bandwidth constraints. We propose a multimodal foreground feature fusion architecture that selectively processes and transmits critical information, substantially reducing irrelevant background data transfer. An innovative occlusion confidence-aware communication technique dynamically adjusts communication regions based on occlusion levels, ensuring efficient data exchange. Fusion2comm sets a new benchmark in the DAIR-V2X dataset, achieving an average precision of 71.25% with minimal bandwidth usage of 221.04 bytes. Our comprehensive experimental evaluations confirm that Fusion2comm substantially advances detection precision while simultaneously improving communication efficiency. •Innovative Occlusion Confidence-Aware Communication. We introduce a technique to quantify occlusion levels, allowing for smart data transmission prioritization. This method optimizes bandwidth by concentrating on the most critically affected areas.•Strategic Foreground Feature Fusion. Fusion2comm uses selective fusion to combine key foreground features, optimizing bandwidth by cutting down on background data transmission.•Performance. Experimental results on the DAIR-V2X dataset show that our method significantly improves detection accuracy and communication efficiency.
ISSN:	0031-3203
DOI:	10.1016/j.patcog.2024.110939