M‐CoTransT: Adaptive spatial continuity in visual tracking

Visual tracking is an important area in computer vision. Based on the Siamese network, current tracking methods employ the self‐attention block in convolutional networks to extract semantic features containing the image structure information of an object. However, spatial continuity is a point of co...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IET Computer Vision 2022-06, Vol.16 (4), p.350-363
Hauptverfasser:	Fan, Chunxiao, Zhang, Runqing, Ming, Yue
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis Artificial neural networks Computer vision Confidence Feature extraction Machine vision Markov analysis Methods Neural networks Occlusion Optical tracking Semantics Tracking networks Visual tasks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Visual tracking is an important area in computer vision. Based on the Siamese network, current tracking methods employ the self‐attention block in convolutional networks to extract semantic features containing the image structure information of an object. However, spatial continuity is a point of contradiction between two seemingly unrelated challenges, that is, occlusion and similar distractor, in tracking methods. At the same time, it is a spatially discontinuous task to locate a target reappearing after occlusion accurately. The prediction of bounding boxes should be constrained by spatial continuity to prevent them from jumping into similar distractors. This study proposes a novel tracking method for introducing spatial continuity in visual tracking called M‐CoTransT; the novel tracking method is developed through the confidence‐based adaptive Markov motion model (M‐model) and a novel correlation‐based feature fusion network (CoTransT). In particular, the M‐model provides confidence for the nodes of the Markov motion model to estimate the motion state continuity. It also predicts a more accurate search region for CoTransT, which then adds a cross‐correlation branch into the self‐attention tracking network to enhance the continuity of target appearance in the feature space. Extensive experiments on five challenging datasets (LaSOT, GOT‐10k, TrackingNet, OTB‐2015 and UAV123) demonstrated the effectiveness of the proposed M‐CoTransT in visual tracking.
ISSN:	1751-9632 1751-9640
DOI:	10.1049/cvi2.12092