Tracking subjects and detecting relationships in crowded city videos

Multi-subject tracking in crowded videos is an established yet challenging research direction in computer vision and information processing. High applicability of multi-subject tracking is demonstrated in smart cities (e.g., public safety, crowd management, urban planning), autonomous driving vehicl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2024-02, Vol.83 (5), p.15339-15361
Hauptverfasser:	Elias, Petr, Macko, Matus, Sedmidubsky, Jan, Zezula, Pavel
Format:	Artikel
Sprache:	eng
Schlagworte:	1158T: Role of Computer Vision in Smart Cities: Applications and Research Challenges Boxes Cities Computer Communication Networks Computer Science Computer vision Couples Crowd monitoring Data processing Data Structures and Information Theory Deep learning Information processing Machine vision Multimedia Multimedia Information Systems Public safety Safety management Smart cities Social factors Social interaction Special Purpose and Application-Based Systems Tracking Urban planning Usability Video
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multi-subject tracking in crowded videos is an established yet challenging research direction in computer vision and information processing. High applicability of multi-subject tracking is demonstrated in smart cities (e.g., public safety, crowd management, urban planning), autonomous driving vehicles, robotic vision, or psychology (e.g., social interaction and crowd behavior understanding). In this work, we propose a real-time approach that reveals tracks of subjects in ordinary videos, captured in highly populated pedestrian areas, such as squares, malls, and stations. The tracks are discovered based on the proximity of detected bounding boxes of subjects in consecutive video frames. The reduction of track fragmentation and identity switching is achieved by the re-identification phase that uses caching of unassociated detections and mutual projection of interrupted tracks. As the proposed approach does not require time-consuming extraction of appearance-based features, the superior tracking speed is achieved. In addition, we demonstrate tracker usability and applicability by extracting valuable information about body-joint positions from discovered tracks, which opens promising possibilities for detecting human relationships and interactions. We demonstrate accurate detection of couples based on their holding hand activity and families based on children’s body proportions. The discovery of these entitative groups is especially challenging in crowded city scenes where many subjects appear in each frame.
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-021-11891-z