MLFMNet: A Multilevel Feature Mining Network for Semantic Segmentation on Aerial Images

Semantic segmentation of aerial images is crucial in various practical applications, encompassing traffic management, search tasks, urban planning, and more. However, due to the unique shooting angles of aerial images, there are significant challenges in accurately segmenting objects, including larg...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of selected topics in applied earth observations and remote sensing 2024, Vol.17, p.16165-16179
Hauptverfasser:	Wei, Xinyu, Rao, Lei, Fan, Guangyu, Chen, Niansheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Aerial images Coders Convolution convolutional neural networks (CNNs) Decoding Deformation Feature extraction Image enhancement Image processing Image reconstruction Image segmentation Modules Multilevel Receptive field Semantic segmentation Semantics Target detection Task analysis Traffic management Traffic planning Transformers Urban planning Vectors
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Semantic segmentation of aerial images is crucial in various practical applications, encompassing traffic management, search tasks, urban planning, and more. However, due to the unique shooting angles of aerial images, there are significant challenges in accurately segmenting objects, including large variations in object scales, deformations, and unclear features of small targets. To address this, we propose a multilevel feature mining network based on an encoder-decoder architecture called MLFMNet, aimed at excavating and integrating multilevel feature information in aerial images to enhance segmentation accuracy and robustness. MLFMNet leverages skip connections to obtain hierarchical feature representations from the encoder. Subsequently, through learnable fusion module and feature reconstruction module in the proposed decoder, it progressively fuses and reconstructs these features, thereby achieving accurate semantic segmentation. To tackle issues of significant size variations and deformations in objects, we design an irregular pyramid receptive field module embedded at the bottom of the encoder to capture receptive fields from multiple feature vectors, thus further mining abstract features. Moreover, to address the challenge of low segmentation and detection accuracy for small targets, a fine-grained feature mining module is embedded in the bottom of the decoder to capture spatial detail features. Particularly, MLFMNet-B achieves an mIoU of 70.8%, ranking fourth in the official leaderboard of the UAVid test set.
ISSN:	1939-1404 2151-1535
DOI:	10.1109/JSTARS.2024.3452250