SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation

•An efficient multi-view 3D object detection algorithm, SparseDet, is proposed to construct sparse scene representation, which can remarkably diminish computational demands while enhancing detection accuracy.•A sparse sampling strategy utilizing both category-aware and geometry-aware supervisions is...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Advanced engineering informatics 2024-10, Vol.62, p.102955, Article 102955
Hauptverfasser: Li, Jingzhong, Yang, Lin, Shi, Zhen, Chen, Yuxuan, Jin, Yue, Akiyama, Kanta, Xu, Anze
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•An efficient multi-view 3D object detection algorithm, SparseDet, is proposed to construct sparse scene representation, which can remarkably diminish computational demands while enhancing detection accuracy.•A sparse sampling strategy utilizing both category-aware and geometry-aware supervisions is introduced to enhance foreground recognition.•A background aggregation module is designed to condense extensive background features into a compact set, which can significantly reduce computational costs while adaptively retaining contextual information.•Experimental results demonstrate the superiority of our method to state-of-the-art methods in both detection accuracy and inference speed. Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of dense scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits sparse scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU.
ISSN:1474-0346
DOI:10.1016/j.aei.2024.102955