Objformer: Boosting 3D object detection via instance-wise interaction

Deep learning on point clouds drives 3D object detection. Despite rapid progress, point-based methods still suffer from the problems such as incompletion and occlusion, which are caused by the material properties of objects and cluttered scenes. These difficult targets increase the difficulty of ide...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition 2024-02, Vol.146, p.110061, Article 110061
Hauptverfasser:	Tao, Manli, Zhao, Chaoyang, Tang, Ming, Wang, Jinqiao
Format:	Artikel
Sprache:	eng
Schlagworte:	3D object detection Incompletion and occlusion Instance-wise interaction Point clouds
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Deep learning on point clouds drives 3D object detection. Despite rapid progress, point-based methods still suffer from the problems such as incompletion and occlusion, which are caused by the material properties of objects and cluttered scenes. These difficult targets increase the difficulty of identification or even lead to misidentification, severely weakening the performance of point-based methods on 3D object detection. To alleviate the above problems, we propose the Objformer to boost point-based 3D object detection via instance-wise interaction. We design an instance feature encoder to encode clean instance features, which contain key geometric priors and holistic semantic information. Further, an instance interaction module is devised to aggregate the complementary features across instances with label-guided interaction, boosting the performance of the 3D object detection. Experiments show that Objformer outperforms previous point-based state-of-the-arts on two popular benchmarks, ScanNet V2 and SUN RGB-D. Especially, our single-modal Objformer even outperforms the competing advanced multi-modal fusion method on both SUN RGB-D and ScanNet V2. •This paper proposes a novel two-stage end-to-end differentiable architecture for the 3D object detection in point clouds, which is dubbed as Objformer.•Equipped with the specially designed instance feature encoder, Objformer can extract clean instance feature and significant geometric prior of the target.•By encoding the pseudo category label from the 3D proposals into the semantic feature of instance, Objformer can boost the information complementarity of across objects with the instance interaction module.•Proposed Objformer achieves state-of-the-art 3D object detection performance on SUN RGB-D and ScanNet. The significant performance gains on both benchmarks and the improvement over the multi-modal method indicate the superiority of Objformer.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2023.110061