A Semi-Supervised Pyramid Cross-Temporal Attention Transformer for Change Detection in High-Resolution Remote Sensing Images

The vision transformer (ViT) model has the advantage of being able to model the long-range dependencies in the imagery and has been studied for the task of remote sensing image change detection (CD). However, the performance of the existing transformer-based CD methods is not satisfactory in the cas...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE geoscience and remote sensing letters 2024, Vol.21, p.1-5
Hauptverfasser:	Lv, Pengyuan, Li, Mengchen, Zhong, Yanfei
Format:	Artikel
Sprache:	eng
Schlagworte:	Attention mechanism Change detection change detection (CD) Computed tomography Decoding Encoders-Decoders Error reduction Feature extraction high-resolution remote sensing images Image resolution Information processing Parameters Remote sensing semi-supervised learning Spatial resolution Training transformer Transformers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The vision transformer (ViT) model has the advantage of being able to model the long-range dependencies in the imagery and has been studied for the task of remote sensing image change detection (CD). However, the performance of the existing transformer-based CD methods is not satisfactory in the case of limited labeled data. The original self-attention (SA) mechanism cannot effectively extract the change information, and the large number of parameters in the ViT model makes the model difficult to train. To solve the above-mentioned problems, a semi-supervised pyramid cross-temporal attention transformer for change detection (CT2RCDSS) is proposed in this letter. The CT2RCDSS method follows an encoder-decoder structure. The encoder utilizes a dual-branch structure, containing the combination of the pyramid cross-temporal attention (PCTA) and pyramid SA (PSA) mechanisms, which is designed to consider the interaction of the features from different time phases and enhance the changes at different scales. In the decoder, a series of deconvolutional layers with skip connections are utilized, and a Softmax layer follows to acquire the final binary change map. In addition, a semi-supervised training strategy, which reduces the errors in the pseudo-labels generated from the models initialized with different parameters, is used to improve the model stability while using unlabeled data. The experiments showed that the proposed method can achieve a superior F1-score and intersection over union (IoU), which indicates the potential of the proposed method.
ISSN:	1545-598X 1558-0571
DOI:	10.1109/LGRS.2024.3404645