Exploring the complementarity between convolution and transformer matching for visual tracking

The essence of Siamese trackers is the similarity matching between a target template deep feature and a search region deep feature. With the successful application of the Transformer in the vision community, the similarity matching manner is moving from convolution matching to Transformer matching....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2024-09, Vol.300, p.112184, Article 112184
Hauptverfasser: Wang, Zheng’ao, Li, Ming, Pei, Wenjie, Lu, Guangming, Chen, Fanglin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The essence of Siamese trackers is the similarity matching between a target template deep feature and a search region deep feature. With the successful application of the Transformer in the vision community, the similarity matching manner is moving from convolution matching to Transformer matching. While this transition achieves a performance boost, we explore that there exists an intuitive complementarity between convolution matching and Transformer matching. Therefore, employing only one of the two matchings is suboptimal for the trackers, and exploiting their complementarity holds great potential. To this end, we present a Matching Knowledge Fusion (MKF) module that efficiently integrates a convolution matching and an enhanced Transformer matching to exploit the explored matching complementarity. Furthermore, aiming at the issue that the noisy and ambiguous attention weights of Transformer matching lead to the degradation of matching results, a novel mechanism of utilizing complementary matching knowledge to correct the attention weights is proposed. Based on the Matching Knowledge Fusion module, we build a simple but effective tracker, dubbed MKFTrack. Extensive experiments demonstrate the favorable performance of our tracker against state-of-the-art ones.
ISSN:0950-7051
DOI:10.1016/j.knosys.2024.112184