A Divided Spatial and Temporal Context Network for Remote Sensing Change Detection

In recent days, change detection has become one of the central tasks for remote sensing image analyses. Due to the powerful discriminative abilities, various convolutional-based approaches have been applied and shown favorable performance in change detection. However, these approaches either require...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of selected topics in applied earth observations and remote sensing 2022, Vol.15, p.4897-4908
Hauptverfasser:	Shi, Nian, Chen, Keming, Zhou, Guangyao
Format:	Artikel
Sprache:	eng
Schlagworte:	Change detection Computer applications Context Context modeling convolutional neural networks Decoding Detection Feature extraction Mathematical models Methods Modelling Parameters Remote sensing self-attention Spatial data spatial-temporal transformer Task analysis Transformers Visualization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In recent days, change detection has become one of the central tasks for remote sensing image analyses. Due to the powerful discriminative abilities, various convolutional-based approaches have been applied and shown favorable performance in change detection. However, these approaches either require numerous parameters to obtain refined features or cannot make full use of the global context information, which is crucial for change detection. Motivated by the recently proposed visual transformers, we introduce a divided spatial and temporal context modeling network to tackle such shortcomings, which is tokens-based and passes the global context by well-modeled tokens. Specifically, to model the spatial context, we first use a spatial self-attention to make each token implicitly incorporate the spatial information of the corresponding image. Then, a followed temporal self-attention is used to model the temporal context. Together with the spatial self-attention, it makes the learned tokens contain the global context and become more representational and suitable for change detection. Finally, a prediction head is used to output change detection results over the token space without additional transformer decoder or skip connections between features and tokens, thus reducing the model parameters and computational costs. Thanks to the superior global context modeling capabilities of the proposed method, we further develop a simplified variant with much smaller parameters but only a slight drop in F1 and IoU scores. Our proposed method has shown competitive performance and surpasses several state-of-the-art methods according to our experiments.
ISSN:	1939-1404 2151-1535
DOI:	10.1109/JSTARS.2022.3176858