Frequency-aware robust multidimensional information fusion framework for remote sensing image segmentation
Urban scene image segmentation is an important research area in high-resolution remote sensing image processing. However, due to its complex three-dimensional structure, interference factors such as occlusion, shadow, intra-class inconsistency, and inter-class indistinction affect segmentation perfo...
Gespeichert in:
Veröffentlicht in: | Engineering applications of artificial intelligence 2024-03, Vol.129, p.107638, Article 107638 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Urban scene image segmentation is an important research area in high-resolution remote sensing image processing. However, due to its complex three-dimensional structure, interference factors such as occlusion, shadow, intra-class inconsistency, and inter-class indistinction affect segmentation performance. Many methods have combined local and global information using CNNs and Transformers to achieve high performance in remote sensing image segmentation tasks. However, these methods are not stable when dealing with these interference factors. Recent studies have found that semantic segmentation is highly sensitive to frequency information, so we introduced frequency information to make the model learn more comprehensively about different categories of targets from multiple dimensions. By modeling the target with local features, global information, and frequency information, the target features can be learned in multiple dimensions to reduce the impact of interference factors on the model and improve its robustness. In this paper, we consider frequency information in addition to combining CNNs and Transformers for modeling and propose a Multidimensional Information Fusion Network (MIFNet) for high-resolution remote sensing image segmentation of urban scenes. Specifically, we design an information fusion Transformer module that can adaptively associate local features, global semantic information, and frequency information and a relevant semantic aggregation module for aggregating features at different scales to construct the decoder. By aggregating image features at different depths, the specific representation of the target and the correlation between targets can be modeled in multiple dimensions, allowing the network to better recognize and understand the features of each class of targets to resist various interference factors that affect segmentation performance. We conducted extensive ablation experiments and comparative experiments on the ISPRS Vaihingen and ISPRS Potsdam benchmarks to verify our proposed method. In a large number of experiments, our method achieved the best results, with 84.53% and 87.3% mIoU scores on the Vaihingen and Potsdam datasets, respectively, proving the superiority of our method. The source code will be available at https://github.com/JunyuFan/MIFNet. |
---|---|
ISSN: | 0952-1976 1873-6769 |
DOI: | 10.1016/j.engappai.2023.107638 |