A Mamba-Diffusion Framework for Multimodal Remote Sensing Image Semantic Segmentation

Recent advances in deep learning have made significant progress in multimodal remote sensing semantic segmentation. However, current methods face challenges in maintaining geometric consistency, particularly when dealing with large objects, resulting in fragmented segmentation masks. We propose a Ma...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE geoscience and remote sensing letters 2024, Vol.21, p.1-5
Hauptverfasser:	Du, Wen-Liang, Gu, Yang, Zhao, Jiaqi, Zhu, Hancheng, Yao, Rui, Zhou, Yong
Format:	Artikel
Sprache:	eng
Schlagworte:	Deep learning Diffusion Diffusion-based segmentation Feature extraction Image processing Image segmentation Mamba-based fusion Masks multimodal semantic segmentation Noise measurement Noise prediction Pipelines Random noise Remote sensing Semantic segmentation Semantics Shape Training Transformers Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recent advances in deep learning have made significant progress in multimodal remote sensing semantic segmentation. However, current methods face challenges in maintaining geometric consistency, particularly when dealing with large objects, resulting in fragmented segmentation masks. We propose a Mamba-diffusion framework to preserve geometric consistency in segmentation masks. This framework preserves geometric consistency by introducing a generative diffusion-based semantic segmentation pipeline and developing a Mamba-based multimodal fusion model. The fusion model fuses the multimodal images in multiple scales and scanning mechanisms by a double cross-fusion (DCF) module. Then, the cross-modal information is further integrated by a dual-splitting structured state-space (DS-S4) model. Finally, the diffusion-based segmentation pipeline predicts semantic masks by progressively refining random Gaussian noise, guided by fused multimodal features. Our experimental results, verified on WHU-OPT-SAR and Hunan datasets, demonstrate that the proposed framework surpasses state-of-the-art (SOTA) methods by a considerable margin. Our codes are available at https://github.com/WenliangDu/MambaDiffusion .
ISSN:	1545-598X 1558-0571
DOI:	10.1109/LGRS.2024.3476269