FusFormer: global and detail feature fusion transformer for semantic segmentation of small objects

Improving the segmentation accuracy of small objects is essential for tasks such as autono-mous driving and remote sensing. However, the current main semantic segmentation methods are inadequate for small objects. To improve the segmentation accuracy of small objects, long-range global information a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2024, Vol.83 (41), p.88717-88744
Hauptverfasser:	Li, Zheng, Chen, Houjin, Li, Jupeng, Peng, Song, Zhang, Zhenhao, Wang, Baozheng, Wang, Changyong
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Artificial neural networks Computer Communication Networks Computer Science Data integration Data Structures and Information Theory Image acquisition Modules Multimedia Information Systems Remote sensing Semantic segmentation Semantics Special Purpose and Application-Based Systems Track 6: Computer Vision for Multimedia Applications
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Improving the segmentation accuracy of small objects is essential for tasks such as autono-mous driving and remote sensing. However, the current main semantic segmentation methods are inadequate for small objects. To improve the segmentation accuracy of small objects, long-range global information and fine local details are needed, and neither pure Convolutional Neural Networks (CNNs) nor Vision Transformers (ViTs) can effectively provide these two different types of information simultaneously. In this paper, we introduce a novel model FusFormer for finding a solution, which contains a global branch and a detailed branch to fully capture the long-range features and spatial detail features from the input image. The global branch is based on MiT-B2 to efficiently acquire global context, while the detailed branch acquires rich local detail information by Spatial Prior Module (SPM) and Multi-scale Module (MSM). Feature Interaction Module (FIM) is proposed to perform information fusion across features at a dual scale. In addition, Multi-scale Edge Extraction Module (MSEEM) is utilized to supplement the missing edge information during model training, helping the model to better enhance the intra-class consistency of small objects. Extensive experiments on Cityscapes, ADE20K and PASCAL VOC 2012 show that our model achieves competitive overall segmentation accuracy, especially on small objects. FusFormer achieves 82.6 % , 47.3 % and 82.4 % mIoU on the Cityscapes, ADE20K and PASCAL VOC 2012 validation sets, compared with other state-of-the-art methods, the proposed model significantly improves the IoU on small objects by 2 % -4 % .
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-024-18911-8