Instance-Based Feature Pyramid for Visual Object Tracking

The deep learning based methods have improved the visual tracking precision significantly. However, the background distraction and the high precise localization remain challenging problems. Despite that some methods have fused the deep and shallow layer features to solve these problems, the existing...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2022-06, Vol.32 (6), p.3774-3787
Hauptverfasser: Pi, Zhixiong, Shao, Yuanjie, Gao, Changxin, Sang, Nong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The deep learning based methods have improved the visual tracking precision significantly. However, the background distraction and the high precise localization remain challenging problems. Despite that some methods have fused the deep and shallow layer features to solve these problems, the existing fusion methods, like simply concatenating or adding the features from the different layers, cannot take the advantage of both the deep and shallow layer features fully. In this paper, we propose a new adaptive feature fusion method, called the instance-based feature pyramid (IBFP) to obtain the discriminative high-resolution feature, which not only inherits the discriminative information from the deep layer feature, but also keeps the high precision localization information of the shallow layer feature. For utilizing the deep and shallow features effectively, we design an instance-based upsampling (IBU) module to fuse them, and a compressed space channel selection (CSCS) module to re-weight the feature channels adaptively. We insert the IBU and CSCS modules in the Siamese tracker for end-to-end training and testing. By using the proposed IBU and CSCS modules, we fuse the deep and shallow features in a series manner. Experiments on large-scale benchmark datasets demonstrate that the proposed modules boost the capabilities of distinguishing the targets and the similar distractors and perform favorably against the state-of-the-art.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2021.3113041