A Cascaded Network With Coupled High-Low Frequency Features for Building Extraction

Accurately extracting buildings from high-resolution remote sensing images is crucial for human productivity and livelihood in urban areas. Due to varying scales and indistinct boundaries of buildings, it is crucial to fully leverage the high- and low-frequency features in building extraction from r...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal of selected topics in applied earth observations and remote sensing 2024, Vol.17, p.10390-10406
Hauptverfasser: Chen, Xinyang, Xiao, Pengfeng, Zhang, Xueliang, Muhtar, Dilxat, Wang, Luhan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Accurately extracting buildings from high-resolution remote sensing images is crucial for human productivity and livelihood in urban areas. Due to varying scales and indistinct boundaries of buildings, it is crucial to fully leverage the high- and low-frequency features in building extraction from remote sensing images. However, previous studies have solely relied on either low- or high-frequency features, leading to errors such as omissions or internal holes in the detected buildings at various scales. Although some studies have considered the integration between both high- and low-frequency features, they overlook the suitability of different network depths for extracting different frequency features. A novel network called Cascaded Inception Conv-Former Network (CICF-Net) is proposed in this study to solve these problems. It leverages the parallel combination of convolutional neural network and Transformer to efficiently extract high- and low-frequency features for building extraction. In the encoder, as the network depth grows, we gradually reduce the contribution of high-frequency branch and enhance the focus on low-frequency branch. Moreover, a cascaded fusion strategy is employed to extract and integrate multiscale high- and low-frequency features. Meanwhile, we propose gated convolution UperNet as the decoder, which utilizes recursive gated convolution to facilitate multilevel spatial interactions and better restoration of fine-grained spatial details for building segmentation. The proposed CICF-Net achieves competitive accuracies on three public benchmarks: Massachusetts Building Dataset, WHU Aerial Building Dataset, and Inria Aerial Image Labeling Dataset, with IoU of 75.17%, 91.45%, and 81.28%, respectively. This provides strong evidence of its effectiveness in building extraction, as it can accurately capture spatial details and context of buildings.
ISSN:1939-1404
2151-1535
DOI:10.1109/JSTARS.2024.3403882