A Mamba-Diffusion Framework for Multimodal Remote Sensing Image Semantic Segmentation
Recent advances in deep learning have made significant progress in multimodal remote sensing semantic segmentation. However, current methods face challenges in maintaining geometric consistency, particularly when dealing with large objects, resulting in fragmented segmentation masks. We propose a Ma...
Gespeichert in:
Veröffentlicht in: | IEEE geoscience and remote sensing letters 2024, Vol.21, p.1-5 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent advances in deep learning have made significant progress in multimodal remote sensing semantic segmentation. However, current methods face challenges in maintaining geometric consistency, particularly when dealing with large objects, resulting in fragmented segmentation masks. We propose a Mamba-diffusion framework to preserve geometric consistency in segmentation masks. This framework preserves geometric consistency by introducing a generative diffusion-based semantic segmentation pipeline and developing a Mamba-based multimodal fusion model. The fusion model fuses the multimodal images in multiple scales and scanning mechanisms by a double cross-fusion (DCF) module. Then, the cross-modal information is further integrated by a dual-splitting structured state-space (DS-S4) model. Finally, the diffusion-based segmentation pipeline predicts semantic masks by progressively refining random Gaussian noise, guided by fused multimodal features. Our experimental results, verified on WHU-OPT-SAR and Hunan datasets, demonstrate that the proposed framework surpasses state-of-the-art (SOTA) methods by a considerable margin. Our codes are available at https://github.com/WenliangDu/MambaDiffusion . |
---|---|
ISSN: | 1545-598X 1558-0571 |
DOI: | 10.1109/LGRS.2024.3476269 |