DARSegNet: A Real-Time Semantic Segmentation Method Based on Dual Attention Fusion Module and Encoder-Decoder Network

The convolutional neural network achieves excellent semantic segmentation results in artificially annotated datasets with complex scenes. However, semantic segmentation methods still suffer from several problems such as low use rate of the features, high computational complexity, and being far from...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Mathematical problems in engineering 2022-06, Vol.2022, p.1-10
Hauptverfasser: Xing, Yongfeng, Zhong, Luo, Zhong, Xian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The convolutional neural network achieves excellent semantic segmentation results in artificially annotated datasets with complex scenes. However, semantic segmentation methods still suffer from several problems such as low use rate of the features, high computational complexity, and being far from practical real-time application, which bring about challenges for the image semantic segmentation. Two factors are very critical to semantic segmentation task: global context and multilevel semantics. However, generating these two factors will always lead to high complexity. In order to solve this, we propose a novel structure, dual attention fusion module (DAFM), by eliminating structural redundancy. Unlike most of the existing algorithms, we combine the attention mechanism with the depth pyramid pool module (DPPM) to extract accurate dense features for pixel labeling rather than complex expansion convolution. Specifically, we introduce a DPPM to execute the spatial pyramid structure in output and combine the global pool method. The DAFM is introduced in each decoder layer. Finally, the low-level features and high-level features are fused to obtain semantic segmentation result. The experiments and visualization results on Cityscapes and CamVid datasets show that, in real-time semantic segmentation, we have achieved a satisfactory balance between accuracy and speed, which proves the effectiveness of the proposed algorithm. In particular, on a single 1080ti GPU computer, ResNet-18 produces 75.53% MIoU at 70 FPS on Cityscapes and 73.96% MIoU at 109 FPS on CamVid.
ISSN:1024-123X
1563-5147
DOI:10.1155/2022/6195148