ESMS-Net: Enhancing Semantic-Mask Segmentation Network With Pyramid Atrousformer for Remote Sensing Image

Transformers has gained widespread adoption in remote sensing image (RSI) segmentation. However, RSI has densely overlapping terrain and significant shadow, making it challenging to segment the blended boundaries of terrains that are the hard classes. Currently, most transformer-based methods constr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-14
Hauptverfasser: Liu, Jiamin, Wang, Ziyi, Luo, Fulin, Guo, Tan, Yang, Feng, Gao, Xinbo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Transformers has gained widespread adoption in remote sensing image (RSI) segmentation. However, RSI has densely overlapping terrain and significant shadow, making it challenging to segment the blended boundaries of terrains that are the hard classes. Currently, most transformer-based methods construct the self-attention with a sliding window, which influences the feature receptive fields to conquer the intersecting and overlapping objects. Additionally, they often rarely focus specifically on the representation of these hard segmentation objects. To overcome these challenges, we propose a novel Enhancing Semantic Mask Segmentation Network (ESMS-Net) framework including a local-global joint encoder, an auxiliary enhanced encoder, and a multiscale dense decoder. In the local-global joint encoder, we construct a Pyramid Pooling AtrousFormer (PPAFormer) that performs the self-attention with a pyramid-structured atrous sliding window, which enhances the range of receptive fields and the global representation performance. Meanwhile, we construct the dual-feature fusion module (DFFM) and multilevel feature weighted fusion (MFWF) in the multiscale dense decoder to reduce information loss and facilitate the interaction of deep semantic information. For the auxiliary enhanced encoder, we develop a semantic mask based on the predicted results to maintain the hard segmentation classes, and then use the same structure as the first two stages of the local-global joint encoder to learn the hard regions again. Extensive experiments demonstrate the proposed ESMS-Net can achieve significant improvements for segmentation performance compared with the state-of-the-art methods on the ISPRS-Vaihingen and Potsdam datasets. The code will be available at https://github.com/Wzysaber/ESMS-Net .
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2024.3504733