Visual Tracking based on deformable Transformer and spatiotemporal information
At present, the Transformer-based Siamese network visual tracking method has shown strong influence and achieved remarkable results on various experimental sets. Especially on the premise of training on large-scale datasets, attention-based Transformer structures have been widely used. However, many...
Gespeichert in:
Veröffentlicht in: | Engineering applications of artificial intelligence 2024-01, Vol.127, p.107269, Article 107269 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | At present, the Transformer-based Siamese network visual tracking method has shown strong influence and achieved remarkable results on various experimental sets. Especially on the premise of training on large-scale datasets, attention-based Transformer structures have been widely used. However, many trackers ignore the fusion enhancement of local and global features, and lack the extraction of spatiotemporal information. At the same time, the original Transformer structural features are redundant and will be affected by irrelevant parts beyond the region of interest. To address these issues, we propose a new method (DTS) based on deformable Transformer and spatiotemporal information. As a Siamese structure, it contains multiple modules. The template frame gets local to global important features through 2D CNN and Self-Attention. The search frame gets spatiotemporal information of interest through 3D CNN and SFM and DAM and then uses Cross-Attention to establish the correlation between them, and finally predict the bounding box of the target through the corner points. In order to verify the effectiveness of our method, we conduct experiments on the LaSOT, GOT-10K, TrackingNet, VOT2018, OTB100 and VAU123 benchmark datasets, the result index is 2%–3% higher than the baseline method. The model structure is simplified without affecting the performance, and the FPS reaches 50. The results show that our proposed tracker is very competitive compared with other state-of-the-art methods. |
---|---|
ISSN: | 0952-1976 1873-6769 |
DOI: | 10.1016/j.engappai.2023.107269 |