Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation

The encoder–decoder structure has been introduced into semantic segmentation to improve the spatial accuracy of the network by fusing high- and low-level feature maps. However, recent state-of-the-art encoder–decoder-based methods can hardly attain the real-time requirement due to their complex and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural networks 2021-05, Vol.137, p.188-199
Hauptverfasser:	Peng, Chengli, Tian, Tian, Chen, Chen, Guo, Xiaojie, Ma, Jiayi
Format:	Artikel
Sprache:	eng
Schlagworte:	Attention mechanism Deep learning Real time Semantic segmentation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The encoder–decoder structure has been introduced into semantic segmentation to improve the spatial accuracy of the network by fusing high- and low-level feature maps. However, recent state-of-the-art encoder–decoder-based methods can hardly attain the real-time requirement due to their complex and inefficient decoders. To address this issue, in this paper, we propose a lightweight bilateral attention decoder for real-time semantic segmentation. It consists of two blocks and can fuse different level feature maps via two steps, i.e., information refinement and information fusion. In the first step, we propose a channel attention branch to refine the high-level feature maps and a spatial attention branch for the low-level ones. The refined high-level feature maps can capture more exact semantic information and the refined low-level ones can capture more accurate spatial information, which significantly improves the information capturing ability of these feature maps. In the second step, we develop a new fusion module named pooling fusing block to fuse the refined high- and low-level feature maps. This fusion block can take full advantages of the high- and low-level feature maps, leading to high-quality fusion results. To verify the efficiency of the proposed bilateral attention decoder, we adopt a lightweight network as the backbone and compare our proposed method with other state-of-the-art real-time semantic segmentation methods on the Cityscapes and Camvid datasets. Experimental results demonstrate that our proposed method can achieve better performance with a higher inference speed. Moreover, we compare our proposed network with several state-of-the-art non-real-time semantic segmentation methods and find that our proposed network can also attain better segmentation performance. •We propose a refinement block to improve the information capturing ability of feature maps.•We develop a pooling fusion block to take full advantages of different level feature maps.•Combining two blocks, we propose a lightweight decoder for real-time semantic segmentation.•Our method can attain state-of-the-art accuracy and speed on Cityscapes and Camvid datasets.
ISSN:	0893-6080 1879-2782
DOI:	10.1016/j.neunet.2021.01.021