A Semi-Supervised Pyramid Cross-Temporal Attention Transformer for Change Detection in High-Resolution Remote Sensing Images

The vision transformer (ViT) model has the advantage of being able to model the long-range dependencies in the imagery and has been studied for the task of remote sensing image change detection (CD). However, the performance of the existing transformer-based CD methods is not satisfactory in the cas...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE geoscience and remote sensing letters 2024, Vol.21, p.1-5
Hauptverfasser: Lv, Pengyuan, Li, Mengchen, Zhong, Yanfei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The vision transformer (ViT) model has the advantage of being able to model the long-range dependencies in the imagery and has been studied for the task of remote sensing image change detection (CD). However, the performance of the existing transformer-based CD methods is not satisfactory in the case of limited labeled data. The original self-attention (SA) mechanism cannot effectively extract the change information, and the large number of parameters in the ViT model makes the model difficult to train. To solve the above-mentioned problems, a semi-supervised pyramid cross-temporal attention transformer for change detection (CT2RCDSS) is proposed in this letter. The CT2RCDSS method follows an encoder-decoder structure. The encoder utilizes a dual-branch structure, containing the combination of the pyramid cross-temporal attention (PCTA) and pyramid SA (PSA) mechanisms, which is designed to consider the interaction of the features from different time phases and enhance the changes at different scales. In the decoder, a series of deconvolutional layers with skip connections are utilized, and a Softmax layer follows to acquire the final binary change map. In addition, a semi-supervised training strategy, which reduces the errors in the pseudo-labels generated from the models initialized with different parameters, is used to improve the model stability while using unlabeled data. The experiments showed that the proposed method can achieve a superior F1-score and intersection over union (IoU), which indicates the potential of the proposed method.
ISSN:1545-598X
1558-0571
DOI:10.1109/LGRS.2024.3404645