SVDnet: Singular Value Control and Distance Alignment Network for 3D Object Detection

The SOTA methods proposed voxelization or pillarization to regularize unordered point clouds, improving computing efficiency for LiDAR-based 3D object detection. However, they usually trade partial accuracy for speed. Thus, we bring up a new problem setting: "Is it possible to keep high detecti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on intelligent transportation systems 2023-09, Vol.24 (9), p.1-15
Hauptverfasser:	Chang, Ming-Jen, Cheng, Chih-Jen, Hsiao, Ching-Chun, Li, Yung-Hui, Huang, Ching-Chun
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Alignment autonomous vehicle Feature extraction Head Laser radar LiDAR-based 3D object detection Neck Object detection Object recognition Plugs Point cloud compression Point clouds Sparsity Three dimensional models Three-dimensional displays
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The SOTA methods proposed voxelization or pillarization to regularize unordered point clouds, improving computing efficiency for LiDAR-based 3D object detection. However, they usually trade partial accuracy for speed. Thus, we bring up a new problem setting: "Is it possible to keep high detection accuracy while point-cloud quantization is applied?". To this end, we found that the inconsistent sparsity of the point cloud over the depth distance, which is still an open question, might be the main reason. To address the inconsistency effect, we first proposed a new pillar-based vehicle detection model, named SVDnet, in which novel plug-ins are introduced in its backbone and neck. Specifically, a novel low-rank objective is designed to force the backbone to extract distance/sparsity-aware features and suppress the other feature variations among vehicle samples. Next, we alleviated the remaining feature inconsistency resulting from distance/sparsity in the neck by dynamic feature selection and adaptive feature fusion. Here, feature selection is realized by a position attention network, while feature fusion is achieved by a Distance Alignment Ratio-generation Network (DARN). Later, the selected and fused features, less sensitive to sparsity, are concatenated and fed to an SSD-like detection head. Besides, we also integrate the proposed plug-ins with multiple pillar/voxel-based methods for performance boosting. Our evaluation shows that SVDnet improves the average precision of the distant cases by 8.11% with only 0.23 milliseconds speed drop compared with PointPillars. Furthermore, the extensional results validate that our plug-ins can help SOTA pillar/voxel-based methods to gain noticeable improvement, especially for far-range objects.
ISSN:	1524-9050 1558-0016
DOI:	10.1109/TITS.2023.3267665