Semantic-aware transformer with feature integration for remote sensing change detection

Change detection (CD) aims to detect change objects of interest from bi-temporal images and is a hot research direction due to its value in human civilization. Existing CD methods usually employ convolution or transformer structures to extract image features. However, they neglect the shortcomings a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering applications of artificial intelligence 2024-09, Vol.135, p.108774, Article 108774
Hauptverfasser: Li, Penglei, Si, Tongzhen, Ye, Chuanlong, Guo, Qingbei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Change detection (CD) aims to detect change objects of interest from bi-temporal images and is a hot research direction due to its value in human civilization. Existing CD methods usually employ convolution or transformer structures to extract image features. However, they neglect the shortcomings and complementary advantages of different structures. In addition, the potential relationship between bi-temporal input features is not adequately correlated, resulting in the difficulty of learning changed region features. To this end, we propose a novel Semantic-aware Transformer with Feature Integration (STFI) method to adequately correlate bi-temporal features by exploring complementary advantages from convolution or transformer structures for remote sensing change detection. Specifically, the residual module is employed to extract local information in the shallow layer, while the transformer module is adopted to model long-distance dependencies in the deep layer. We skillfully link convolution and transformer operations to learn complementary local and global features. In addition, we propose to integrate high-level bi-temporal semantic information in two aspects. On the one hand, the feature generation module is built to generate fused features in the local view from the convolutional operation. On the other hand, the feature integration module is designed to fuse global features from transformer operation. Furthermore, we develop an objective function containing two items, i.e., the cross-entropy loss and the dice loss, to optimize the whole CD model. Abundant experiments demonstrate that the proposed STFI surpasses other superior methods by 1.18%, 1.99%, and 3.05%, 5.15% in terms of F1 and IoU on LEVIR-CD and WHU-CD, respectively. [Display omitted] •We fuse convolution and transformer modules to model the local and global contexts.•We design feature generation/integration modules to correlate bi-temporal features.•We develop a function with the cross-entropy and dice losses to optimize the model.•We provide an in-depth analysis and prove that STFI achieves superior performance.
ISSN:0952-1976
1873-6769
DOI:10.1016/j.engappai.2024.108774