CRTrack: Learning Correlation-Refine network for visual object tracking

Conducting reliable feature interaction plays a critical role in the visual tracking community, especially in recent dominated Siamese-based tracking paradigm. In general, there are two primary approaches for fusing representations from template and search area in the Siamese setting, i.e., cross-co...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2024-10, Vol.154, p.110582, Article 110582
Hauptverfasser: Zhang, Wenkang, Xie, Fei, Xu, Tianyang, Zhai, Jiang, Yang, Wankou
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Conducting reliable feature interaction plays a critical role in the visual tracking community, especially in recent dominated Siamese-based tracking paradigm. In general, there are two primary approaches for fusing representations from template and search area in the Siamese setting, i.e., cross-correlation and transformer modeling. The former provides a straightforward interaction solution, which may have limitations in handling complex scenarios, such as appearance variations and occlusion. While the latter offers an effective interaction mechanism, albeit with higher computation complexity and model cost. In contrast to traditional Siamese-based trackers which rely on two mentioned feature cross-correlation operators, this paper proposes a novel Correlation-Refine network to address the issue of lacking semantic information caused by local linear matching in correlation, from both spatial and channel perspectives. Correlation-Refine network (named CR) is solely built on top of fully convolutional layers, without employing intricate transformer mechanisms or complex methods to fuse features from multiple scales. Moreover, we present a concise yet effective convolutional tracking framework based on the correlation-refine network. CR network can increase the discriminative ability of semantic information in a coarse-to-fine manner: it gradually learns the semantic features of the target to be tracked and suppresses interference from similar objects by stacking multiple CR layers. Extensive experiments and comparisons with recent competitive trackers in challenging large-scale benchmarks demonstrate that, our tracker outperforms all previous convolutional trackers and has competitive results with transformer-based method. The code will be made available. •We put forward Correlation-Refine network to address the issue of lacking semantic information.•We construct an efficient tracking framework, gradually eliminating similarity interference from semantic level.•Our tracker outperforms all previous convolutional trackers on four authoritative benchmarks.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2024.110582