TabCtNet: Target-aware bilateral CNN-transformer network for single object tracking in satellite videos

•A bilateral network for tracking the small and weak target in satellite videos.•A CNN-Transformer network aggregates local spatial information and temporal context.•Target-aware blocked-erasing strategy eliminates the influence of similar objects.•Pixel-wise refinement module captures fine-grained...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of applied earth observation and geoinformation 2024-04, Vol.128, p.103723, Article 103723
Hauptverfasser: Zhu, Qiqi, Huang, Xin, Guan, Qingfeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A bilateral network for tracking the small and weak target in satellite videos.•A CNN-Transformer network aggregates local spatial information and temporal context.•Target-aware blocked-erasing strategy eliminates the influence of similar objects.•Pixel-wise refinement module captures fine-grained information for box estimation. Satellite video object tracking has become an emerging technology for dynamically observing the earth, providing the possibility for tracking moving objects in a short time. Deep learning methods such as CNN-based trackers and transformer-based trackers have been widely applied for single object tracking in natural videos. The target in natural videos is captured by ground sensors, whereas satellite sensors come from high altitudes of hundreds of kilometers or more, the trackers designed for natural videos may suffer the influence of complex background, especially small targets with weak features in view of remote sensing platforms. Furtherly, the confusion of visually similar objects with the target and the deformation of target in satellite videos can also lead to incorrect positioning. To address these problems, we proposed a target-aware bilateral CNN-Transformer network (TabCtNet). In TabCtNet, the bilateral CNN-Transformer architecture with the aggregation and interaction of local spatial information and global temporal context is designed to tackle the challenge of small target with weak features in complex and clutter background in satellite videos. To effectively reduce the impact of similar objects, the target-aware block-erasing strategy (TAS) is constructed to generate weakened heatmaps from the template target mask in a data-driven manner. Moreover, a pixel-wise refinement module with corner-based box estimation (PE) is designed to extract more fine-grained spatial information for more accurate box estimation and reduce the effect of target deformation. Experimental results show that TabCtNet quantitatively and qualitatively outperforms advanced single object tracking methods on two different satellite video datasets with four categories of targets from different scenarios. Furthermore, to investigate the generalizability of the TabCtNet framework, satellite videos sourced from different countries captured by various satellite platforms were used for evaluation, and the results reveal its robust performance across various scenarios.
ISSN:1569-8432
1872-826X
DOI:10.1016/j.jag.2024.103723