Dual Siamese network for RGBT tracking via fusing predicted position maps

Visual object tracking is a basic task in the field of computer vision. Despite the rapid development of visual object tracking, it is not reliable to use only visible light images for object tracking in some cases. Since visible light and thermal infrared images have complementary advantages in ima...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Visual computer 2022-07, Vol.38 (7), p.2555-2567
Hauptverfasser:	Guo, Chang, Yang, Dedong, Li, Chang, Song, Peng
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Artificial neural networks Computer Graphics Computer Science Computer vision Deep learning Feature extraction Frames (data processing) Frames per second Image Processing and Computer Vision Infrared imagery Infrared tracking Machine learning Neural networks Optical tracking Original Article Teaching methods
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Visual object tracking is a basic task in the field of computer vision. Despite the rapid development of visual object tracking, it is not reliable to use only visible light images for object tracking in some cases. Since visible light and thermal infrared images have complementary advantages in imaging, and the use of them as a joint input for tracking becomes more noted, this kind of tracking is RGBT tracking. The existing RGBT tracking can be divided into image-level fusion tracking, feature-level fusion tracking, and response-level fusion tracking. Compared with the first two, response-level fusion tracking can use deeper dual-mode image information, but most of them use traditional tracking methods and introduce weights at inappropriate stages. Based on the above, we propose a response-level fusion tracking algorithm that employed deep learning. And the weight distribution is placed in the feature extraction stage, for which we design the joint modal channel attention module. We adopt the Siamese framework and expand it into a dual Siamese subnetwork. In the meantime, we improve the regional proposal subnetwork and propose the strategy for fusing two modal predicted position maps. To verify the performance of our algorithm, we conducted experiments on two tracking benchmarks. After testing, our algorithm has very good performance and runs at 116 frames per second, which far exceeds the real-time requirement of 25 frames per second.
ISSN:	0178-2789 1432-2315
DOI:	10.1007/s00371-021-02131-4