Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs

Convolution Neural Network (CNN) is an important deep learning method, which is widely used in many fields. However, it is very time consuming to implement CNN where convolution usually takes most of the time. There are many zero values in feature maps and filters, which leads to redundant calculati...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on architecture and code optimization 2023-09, Vol.20 (3), p.1-26
Hauptverfasser: Xu, Weizhi, Sun, Yintai, Fan, Shengyu, Yu, Hui, Fu, Xin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Convolution Neural Network (CNN) is an important deep learning method, which is widely used in many fields. However, it is very time consuming to implement CNN where convolution usually takes most of the time. There are many zero values in feature maps and filters, which leads to redundant calculations and memory accesses if dense methods are used to compute convolution. Many works recently make use of sparsity to skip the calculations for zero values to reduce the inference time of CNN. On the GPU platform, current works cannot fully exploit the sparsity of the feature map and achieve satisfactory performance. Therefore, we design a new parallel strategy to transform the feature map into a new storage format to avoid the redundant computation of zero-values on GPUs. Also considering the sparsity in the feature map, we propose a fused storage format to combine the convolution operation with the following pooling operation, in order to further improve the performance. We carry out experiments with mainstream CNN models and achieve better performance compared with cuDNN and cuSPARSE. For VGG-19, ResNet-50, DenseNet-121 and RegNetX-16GF, 1.97 ×, 2.23 ×, 2.74 × and 1.58 × speedups are obtained respectively over cuDNN. The speedups over cuSPARSE are 2.10 ×, 1.83 ×, 2.35 × and 1.35 × respectively when only using the first method.
ISSN:1544-3566
1544-3973
DOI:10.1145/3600092