TACD: A Novel 3-D Swin Transformer With Enhanced Feature Aggregation for Change Detection in Image Time Series

Change detection (CD) in image time series (ITS) improves change information quality by modeling change patterns and mitigating seasonal interferences. Recent deep learning (DL)-based methods have made significant progress in this field. However, most of them are based on convolutional neural networ...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-16
Hauptverfasser:	Mao, Yin, He, Qiuhua, Li, Jianlong, Yang, Bin
Format:	Artikel
Sprache:	eng
Schlagworte:	3-DSwin Transformer (3DST) Aggregation Artificial neural networks Change detection change detection (CD) Coders Context Context modeling Decoding Deep layer Deep learning Distillation feature aggregation Feature extraction Formability Image enhancement Image quality image time series (ITS) Information processing Knowledge management local-global representation Machine learning Modelling Modules Neural networks Semantics Silver Solid modeling Spatial data Spatiotemporal phenomena Temporal variations Three-dimensional displays Time series Time series analysis Transformers Transient analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Change detection (CD) in image time series (ITS) improves change information quality by modeling change patterns and mitigating seasonal interferences. Recent deep learning (DL)-based methods have made significant progress in this field. However, most of them are based on convolutional neural networks (CNNs) and have limitation in global context extraction. Although newly applied transformer-based methods excel in modeling long-range temporal dependency from ITS, they make insufficient spatial information utilization. Given this, we propose a novel 3-D Swin Transformer with enhanced feature aggregation for ITS CD (TACD), which can comprehensively model long-range spatiotemporal dependencies in ITS. It consists of three main parts: 1) A deformable local context extraction (DLCE) module is integrated in TACD. It embeds ITS into 3-D tokens with rich local contextual information, which helps TACD tickle with multiscale ground changes; 2) A UNet structured 3-D Swin Transformer (3D ST-UNet) is proposed for global modeling. It extracts global spatial-spectral-temporal features from 3-D tokens through the encoder and decoder based on 3-D self-attention and models long-range spatiotemporal dependencies effectively; and 3) An enhanced feature aggregation strategy including the time difference (TD) and self-distillation (SD) modules is designed to promote change modeling. The TD module facilitates feature transmission from the encoder to the decoder by modeling transient changes while the SD module transfers hidden knowledge in the ITS into shallow and deep layers, which capture rich contextual information and enhance feature discrimination. TACD outperforms the state-of-the-art approaches on the OSCD and SpaceNet7 datasets. Experiments on ITS CD and bitemporal CD demonstrate its effectiveness. Our code is available at https://github.com/silver_m/TACD .
ISSN:	0196-2892 1558-0644
DOI:	10.1109/TGRS.2024.3496994