Enhancing Intra- and Inter-Object Part Features for 3-D Object Detection Through LiDAR-Camera Fusion

Multimodal feature fusion, which combines camera image and light detection and ranging (LiDAR) point, plays a crucial role in receiving reliable 3-D object detection. However, existing detectors typically employ a general approach to improve the overall accuracy of objects and have limitations in ut...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE sensors journal 2024-08, Vol.24 (16), p.27029-27044
Hauptverfasser: Wan, Rui, Zhao, Tianyun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Multimodal feature fusion, which combines camera image and light detection and ranging (LiDAR) point, plays a crucial role in receiving reliable 3-D object detection. However, existing detectors typically employ a general approach to improve the overall accuracy of objects and have limitations in utilizing multimodal features to explicitly enhance the detection of sparse objects. Moreover, most multimodal detectors either lose valuable image information due to sparse points or incur high computational costs due to dense voxel fusion when projecting points or voxels onto images to capture 2-D features. To address these problems, this article proposes an object-level part aggregation (OPA) module that enhances the features of sparse objects by introducing object parts. The OPA module aggregates image and point features for object parts and facilitates intra-object part feature interactions using position encodings between symmetry parts. Additionally, we design a deformable part transformer (DPT) to further enhance the part features of sparse objects by performing inter-object part feature interactions based on structural similarity between objects of the same category. We also develop a keypoint-point decoder (KPD) to efficiently utilize image information by decoding keypoints from both image and point cloud features. These keypoints are dynamically fused with LiDAR points to improve the image-guided object representation. By combining OPA and KPD, we obtained competitive results on the KITTI and nuScenes datasets compared to state-of-the-art methods. Experiments conducted on the KITTI dataset demonstrated that these improvements mainly focused on detecting sparse objects. Ablation studies consistently showed performance improvement across various detectors through KPD and OPA integration.
ISSN:1530-437X
1558-1748
DOI:10.1109/JSEN.2024.3424836