ESDINet: Efficient Shallow-Deep Interaction Network for Semantic Segmentation of High-Resolution Aerial Images

Semantic segmentation of high-resolution remote sensing images is essential in many fields. Nevertheless, in practical applications, constrained by limited computational resources and complex network structures, many advanced models on semantic segmentation often fail to show efficient performance,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-15
Hauptverfasser: Zhang, Xiangrong, Weng, Zhenhang, Zhu, Peng, Han, Xiao, Zhu, Jin, Jiao, Licheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Semantic segmentation of high-resolution remote sensing images is essential in many fields. Nevertheless, in practical applications, constrained by limited computational resources and complex network structures, many advanced models on semantic segmentation often fail to show efficient performance, prompting research on lightweight models. For lightweight semantic segmentation models, the two-branch architecture has been shown to work well in speed and performance. However, such two-branch architectures usually do not utilize enough information for shallow structures to efficiently provide richer multiscale information for the two branches. The lightweight modules it uses are difficult to extract the global context information of the features effectively. Compared with the current advanced semantic segmentation models, lightweight models still have some differences in performance. In order to solve these problems, we propose a new lightweight dual-branch architecture efficient shallow-deep interaction network (ESDINet), which can quickly extract low-level spatial and high-level semantic information of images through the detail branch and semantic branch. Specifically, we have constructed an efficient double-branch structure with shallow and deep different interactions to achieve multiscale information interaction. At the same time, we optimize the semantic branch and propose a new linear attention block to effectively improve the global perception of the semantic branch. We performed extensive experiments and the results show that our model achieves a good balance between segmentation accuracy and inference speed. In particular, ESDINet achieves 82.03% mean intersection over union (mIoU) on the Vaihingen test set, while the proposed model achieves an inference speed of 116 frames/s (FPS) for 512\times512 inputs on a single NVIDIA GTX 2080Ti GPU.
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2024.3351437