Spatiotemporal adaptive attention 3D multiobject tracking for autonomous driving

Three-dimensional (3D) multiobject tracking (MOT) is an essential perception task for autonomous vehicles (AVs). Studies have indicated that multimodal data fusion can provide more stable and efficient perception information to AVs than a single sensor. Therefore, this paper proposes a new spatiotem...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2023-05, Vol.267, p.110442, Article 110442
Hauptverfasser:	Zhang, Xiaofei, Fan, Zhengping, Tan, Xiaojun, Liu, Qunming, Shi, Yanli
Format:	Artikel
Sprache:	eng
Schlagworte:	3D multiobject tracking Adaptive data association Multimodal data fusion Spatiotemporal attention mechanism
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Three-dimensional (3D) multiobject tracking (MOT) is an essential perception task for autonomous vehicles (AVs). Studies have indicated that multimodal data fusion can provide more stable and efficient perception information to AVs than a single sensor. Therefore, this paper proposes a new spatiotemporal adaptive attention 3D (3DSTAA) tracker, which attempts to improve the tracking performance of the end-to-end 3D MOT by adaptively correlating spatiotemporal data. The novelty of this paper includes the following. (1) Different from nonintelligent fusion methods, this paper uses an efficiently adaptive spatial-guided fusion (SGFus) module for multimodal feature fusion. As a result, the 3D structural information obtained from point cloud data can provide additional spatial information as complementary information to the 2D texture information extracted from the image data, collaboratively facilitating and refining the perception information representation in the margin area. (2) This paper develops a spatiotemporal object-unique attention (STOUA) module that calculates the relational degree of each perceived object between two adjacent frames through attentional encoding. At the same time, an adaptive weighting strategy is used to further study the spatiotemporal correlation of unique objects, reducing the similarity among various objects and the differences across the same object. Experiments tested using the KITTI tracking benchmark show that the 3DSTAA tracker is highly competitive in both inference time and tracking performance compared with state-of-the-art (SOTA) methods. Our corresponding code will be released on the https://github.com/xf-zh. •We propose an end-to-end MOT without additional annotation information.•We design an efficient SGFus module to fuse the multimodal features adaptively.•We develop an STOUA module to maintain the degree of association between objects..•We design a class-specific affinity metric to match the occlusion objects correctly.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2023.110442