Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning

In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2018-10, Vol.28 (10), p.2849-2860
Hauptverfasser:	Xue, Wanli, Xu, Chao, Feng, Zhiyong
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Automobiles Context Feature extraction Image color analysis Image detection Machine learning multi-scale Optical tracking perceptual hash salient sample sample selection spatio-temporal State of the art Target tracking Visual tracking Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal contexts, named the long-term historical targets, the medium-term stable scene (i.e., a short continuous and stable video sequence), and the short-term overall samples to improve the tracking efficiency and reduce the drift phenomenon. Different from conventional multi-timescale tracking paradigm that chooses samples in a fixed manner, MSTC formulates a low-dimensional representation named fast perceptual hash algorithm to update long-term historical targets and the medium-term stable scene dynamically with image similarity. MSTC also differs from most tracking-by-detection algorithms that label samples as positive or negative, it investigates a fusion salient sample detection to fuse weights of the samples not only by the distance information, but also by the visual spatial attention, such as color, intensity, and texture. Numerous experimental evaluations with most state-of-the-art algorithms on the standard 50 video benchmark demonstrate the superiority of the proposed algorithm.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2017.2720749