Scale-pyramid dynamic atrous convolution for pixel-level labeling

For achieving better performance, the majority of deep convolutional neural networks have endeavored to increase the model capacity by adding more convolutional layers or increasing the size of the filters. Consequently, the computational cost increases proportionally with the model capacity. This p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2024-05, Vol.241, p.122695, Article 122695
Hauptverfasser:	Li, Zhiqiang, Jiang, Jie, Chen, Xi, Zhang, Min, Wang, Yong, Li, Qingli, Qi, Honggang, Liu, Min, Laganière, Robert
Format:	Artikel
Sprache:	eng
Schlagworte:	DCNN Deep learning Dynamic convolution Kernel engineering Pixel-level labeling
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	For achieving better performance, the majority of deep convolutional neural networks have endeavored to increase the model capacity by adding more convolutional layers or increasing the size of the filters. Consequently, the computational cost increases proportionally with the model capacity. This problem can be alleviated by dynamic convolution. In the case of pixel-level labeling, existing pixel-level dynamic convolution methods have a smaller scanning area than ordinary convolution or image-level dynamic convolution and are thus unable to exploit fine contextual information. As a consequence, pixel-level dynamic convolution is more sensitive to large-scale varying objects and confusion categories. In this paper, we propose a scale-pyramid dynamic atrous convolution (SDAConv) and exploit multi-scale pixel-level features in finer granularity, in order to efficiently increase model capacity, exploring contextual information, capture detail information and alleviate large-scale variation problem at the same time. Through kernel engineering (instead of network engineering), SDAConv dynamically arranges atrous filters in the individual convolutional kernels over different semantic areas at dense scales in the spatial dimension. By simply replacing the regular convolution with SDAConv in SOTA architectures, extensive experiments on three public datasets, Cityscapes, PASCAL VOC 2012 and ADE20K benchmarks demonstrate the superior performance of SDAConv on pixel-level labeling tasks. •A new scale-pyramid dynamic convolution(SDAConv) is proposed.•SDAConv is truly dynamic and multiple receptive fields.•SDAConv can improve localization accuracy and explore rich contextual information.•SDAConv can be used in a plug-and-play fashion in any architecture.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2023.122695