Multimodal Remote Sensing Image Segmentation With Intuition-Inspired Hypergraph Modeling

Multimodal remote sensing (RS) image segmentation aims to comprehensively utilize multiple RS modalities to assign pixel-level semantics to the studied scenes, which can provide a new perspective for global city understanding. Multimodal segmentation inevitably encounters the challenge of modeling i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing 2023, Vol.32, p.1474-1487
Hauptverfasser: He, Qibin, Sun, Xian, Diao, Wenhui, Yan, Zhiyuan, Yao, Fanglong, Fu, Kun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Multimodal remote sensing (RS) image segmentation aims to comprehensively utilize multiple RS modalities to assign pixel-level semantics to the studied scenes, which can provide a new perspective for global city understanding. Multimodal segmentation inevitably encounters the challenge of modeling intra- and inter-modal relationships, i.e ., object diversity and modal gaps. However, the previous methods are usually designed for a single RS modality, limited by the noisy collection environment and poor discrimination information. Neuropsychology and neuroanatomy confirm that the human brain performs the guiding perception and integrative cognition of multimodal semantics through intuitive reasoning. Therefore, establishing a semantic understanding framework inspired by intuition to realize multimodal RS segmentation becomes the main motivation of this work. Drived by the superiority of hypergraphs in modeling high-order relationships, we propose an intuition-inspired hypergraph network ( I^{2}HN ) for multimodal RS segmentation. Specifically, we present a hypergraph parser to imitate guiding perception to learn intra-modal object-wise relationships. It parses the input modality into irregular hypergraphs to mine semantic clues and generate robust mono-modal representations. In addition, we also design a hypergraph matcher to dynamically update the hypergraph structure from the explicit correspondence of visual concepts, similar to integrative cognition, to improve cross-modal compatibility when fusing multimodal features. Extensive experiments on two multimodal RS datasets show that the proposed I^{2}HN outperforms the state-of-the-art models, achieving F1/mIoU accuracy 91.4%/82.9% on the ISPRS Vaihingen dataset, and 92.1%/84.2% on the MSAW dataset.
ISSN:1057-7149
1941-0042
DOI:10.1109/TIP.2023.3245324