SIFNet: A self-attention interaction fusion network for multisource satellite imagery template matching

•A novel Transformer-based deep fusion network (SIFNet) was proposed for remote sensing image matching.•Multiscale features were extracted by a pyramid network for each pixel.•The extracted features were fused by self-attention layers in Transformer for information interaction.•Two loss functions we...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of applied earth observation and geoinformation 2023-04, Vol.118, p.103247, Article 103247
Hauptverfasser:	Liu, Ming, Zhou, Gaoxiang, Ma, Lingfei, Li, Liangzhi, Mei, Qiong
Format:	Artikel
Sprache:	eng
Schlagworte:	Interaction fusion Multisource remote sensing registration Template matching Transformer
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•A novel Transformer-based deep fusion network (SIFNet) was proposed for remote sensing image matching.•Multiscale features were extracted by a pyramid network for each pixel.•The extracted features were fused by self-attention layers in Transformer for information interaction.•Two loss functions were developed to convert satellite imagery registration into classification and regression tasks for efficient matching purpose. Multisource satellite images provide abundant and complementary earth observations, while nonlinear radiometric and geometric distortions (such as scale and rotation variations) between these multimodal images pose remarkable challenges for further remote sensing applications, such as change detection. We therefore proposed a template matching algorithm based on self-attention interactive fusion network, named SIFNet, to align multisource satellite images. First, a feature pyramid network was first conducted to extract multiscale features for each pixel, with the template and reference images as the inputs. Then, the extracted features were fused by self-attention layers in Transformer for information interaction. Third, the similarity and semantic matching loss functions were developed to convert satellite imagery registration into regression task, allowing SIFNet aligning multimodal patch images more efficiently based on point-to-point correspondence, instead of globally searching extremums as previous matching strategies did. We performed experiments based on four multimodal datasets (i.e. Google, GF-2, Landsat-8 and optical-SAR images) with various scenes to evaluate the performance and robustness of SIFNet. The results demonstrate the proposed SIFNet performed a comparable accuracy for template matching with other algorithms and was robust to geometric distortions and radiometric variations of multisource remote sensing data.
ISSN:	1569-8432 1872-826X
DOI:	10.1016/j.jag.2023.103247