Near-Online Multi-Pedestrian Tracking via Combining Multiple Consistent Appearance Cues

An important cue for multi-pedestrian tracking in video is the consistent appearance of an individual for quite a while. In this paper, we address multi-pedestrian tracking by learning a robust appearance model from the paradigm of tracking by detection. To separate detections of different pedestria...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2021-04, Vol.31 (4), p.1540-1554
Hauptverfasser:	Feng, Weijiang, Lan, Long, Luo, Yong, Yu, Yue, Zhang, Xiang, Luo, Zhigang
Format:	Artikel
Sprache:	eng
Schlagworte:	Ablation Adaptation models adaptive weights Computational modeling Data models focal triplet loss Integrated circuit modeling Multi-pedestrian tracking Pedestrians sequence-to-sequence metric Tracking Trajectory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	An important cue for multi-pedestrian tracking in video is the consistent appearance of an individual for quite a while. In this paper, we address multi-pedestrian tracking by learning a robust appearance model from the paradigm of tracking by detection. To separate detections of different pedestrians while assembling detections of the same pedestrian, we take advantage of the cue of consistent appearance and exploit three types of evidence from the recent, past and near-future. Existing online approaches only exploit the detection-to-detection and sequence-to-detection metrics, which focus on the recent and past appearance patterns respectively, while the future pedestrian appearance is simply ignored. This drawback is remedied in this paper by further considering the sequence-to-sequence metric, which resorts to near-future appearance presentation. Adaptive combination weights are learned to fuse these three different metrics. Moreover, we propose a novel Focal Triplet Loss to make the model focus more on hard examples than the easy ones. We demonstrate that this can significantly enhance the discriminating power of the model compared with treating every sample equally. Effectiveness and efficiency of the proposed method is verified by conducting comprehensive ablation studies and comparing with many competitive (offline/online/near-online) counterparts on the MOT16 and MOT17 Challenges.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2020.3005662