Boosting 3D point-based object detection by reducing information loss caused by discontinuous receptive fields

The point-based 3D object detection method is highly advantageous due to its lightweight nature and fast inference speed, making it a valuable asset in engineering fields such as intelligent transportation and autonomous driving. However, current advanced methods solely focus on learning features fr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of applied earth observation and geoinformation 2024-08, Vol.132, p.104049, Article 104049
Hauptverfasser: Liang, Ao, Hua, Haiyang, Fang, Jian, Zhao, Huaici, Liu, Tianci
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The point-based 3D object detection method is highly advantageous due to its lightweight nature and fast inference speed, making it a valuable asset in engineering fields such as intelligent transportation and autonomous driving. However, current advanced methods solely focus on learning features from the provided point cloud, neglecting the active role of unoccupied space. This results in the problem of discontinuous receptive field (DRF), leading to the loss of semantic and geometric information of the objects. To address this issue, we propose a new end-to-end single-stage point-based model, DRF-SSD, in this paper. DRF-SSD utilizes a PointNet++-style 3D backbone to maintain fast inference capability. Then, point-wise features are projected onto a plane in the Neck structure, and local and global information are aggregated through the designed Hierarchical Encoding–Decoding (HED) and Hybrid Transformer (HT) modules. The former fills in features for unoccupied space through convolutional layers, enhancing local features by interacting with features in occupied space during the learning process. The latter further expands the receptive field using the global learning ability of transformers. The spatial transformation and learning processes in HED and HT only involve key points, and HED is designed to have a special structure that maintains the sparsity of feature maps, preserving the efficiency of the model’s inference. Finally, query features are back-projected onto points for feature enhancement and input into the detection head for prediction. Extensive experiments on the KITTI datasets demonstrate that DRF-SSD achieves superior detection accuracy compared to previous methods, with significant improvements. Specifically, the approach obtains 2.25%, 0.66%, and 0.42% improvement for the metric of 3D Average Precision (AP3D) under the easy, moderate, and hard settings, respectively. Additionally, the method enables other point-based detectors to achieve substantial gains, demonstrating its effectiveness. Our code will be made available at https://github.com/AlanLiangC/DRF-SSD.git. •We identified the issue of discontinuous receptive fields (DRF).•The first model to address DRF with a point-based backbone is proposed.•DRF-SSD achieved a 1.1% improvement in mAP3D compared to the art models.•The designed HED and HT modules have excellent plug-and-play capabilities.
ISSN:1569-8432
DOI:10.1016/j.jag.2024.104049