A Semi-Supervised Pyramid Cross-Temporal Attention Transformer for Change Detection in High-Resolution Remote Sensing Images
The vision transformer (ViT) model has the advantage of being able to model the long-range dependencies in the imagery and has been studied for the task of remote sensing image change detection (CD). However, the performance of the existing transformer-based CD methods is not satisfactory in the cas...
Gespeichert in:
Veröffentlicht in: | IEEE geoscience and remote sensing letters 2024, Vol.21, p.1-5 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The vision transformer (ViT) model has the advantage of being able to model the long-range dependencies in the imagery and has been studied for the task of remote sensing image change detection (CD). However, the performance of the existing transformer-based CD methods is not satisfactory in the case of limited labeled data. The original self-attention (SA) mechanism cannot effectively extract the change information, and the large number of parameters in the ViT model makes the model difficult to train. To solve the above-mentioned problems, a semi-supervised pyramid cross-temporal attention transformer for change detection (CT2RCDSS) is proposed in this letter. The CT2RCDSS method follows an encoder-decoder structure. The encoder utilizes a dual-branch structure, containing the combination of the pyramid cross-temporal attention (PCTA) and pyramid SA (PSA) mechanisms, which is designed to consider the interaction of the features from different time phases and enhance the changes at different scales. In the decoder, a series of deconvolutional layers with skip connections are utilized, and a Softmax layer follows to acquire the final binary change map. In addition, a semi-supervised training strategy, which reduces the errors in the pseudo-labels generated from the models initialized with different parameters, is used to improve the model stability while using unlabeled data. The experiments showed that the proposed method can achieve a superior F1-score and intersection over union (IoU), which indicates the potential of the proposed method. |
---|---|
ISSN: | 1545-598X 1558-0571 |
DOI: | 10.1109/LGRS.2024.3404645 |