Deep learning and multi-modal fusion for real-time multi-object tracking: Algorithms, challenges, datasets, and comparative study

Real-time multi-object tracking (MOT) is a complex task involving detecting and tracking multiple objects. After the objects are detected, they are assigned markers, and their trajectories are tracked in real-time. The scientific community is intrigued by the possibilities of utilizing MOT technolog...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information fusion 2024-05, Vol.105, p.102247, Article 102247
Hauptverfasser: Wang, Xuan, Sun, Zhaojie, Chehri, Abdellah, Jeon, Gwanggil, Song, Yongchao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Real-time multi-object tracking (MOT) is a complex task involving detecting and tracking multiple objects. After the objects are detected, they are assigned markers, and their trajectories are tracked in real-time. The scientific community is intrigued by the possibilities of utilizing MOT technology in the context of smart cities. Their primary focus lies in the domains of intelligent transportation, detection of vehicles and pedestrians, crowd surveillance, and public safety. Deep learning techniques have been developed in recent years to effectively tackle the challenges of real-time MOT tasks and enhance tracking performance. Environmental perception within smart traffic applications relies heavily on sensor data fusion. In traffic scenarios, a thoughtful approach involves utilizing a combination of sensors and cameras to detect and track targets while gathering valuable data effectively. However, it faces challenges when it comes to detecting and tracking objects that are in motion, have complex changes in appearance, or are in crowded scenes. This paper explores the foundational standard for real-time Multiple Object Tracking tasks. We prioritize the examination of quantitative measures by conducting a comprehensive analysis of widely utilized benchmark datasets and metrics. This study also investigates established embedding techniques and multi-modal fusion methods within real-time multi-target tracking algorithms. Each strategy will be classified and assessed according to a predefined set of principles. The paper presents a comprehensive analysis and visual representation of various MOT strategies. Finally, this paper aims to present an overview of the current challenges faced by the MOT mission, as well as the potential objectives that lie ahead. •Typical baselines associated with real-time MOT tasks are analyzed.•Discussion of different embedding methods and real-time MOT algorithms, with classification and discussion.•Popular benchmarking datasets and indicators summarized and quantitatively and comprehensively compared.•A variety of MOT approaches are evaluated and visualized from multiple deep learning perspectives.•Pointing out the challenges and future directions of current MOT tasks.
ISSN:1566-2535
1872-6305
DOI:10.1016/j.inffus.2024.102247