Sense: Model-Hardware Codesign for Accelerating Sparse CNNs on Systolic Arrays

Sparsity is an intrinsic property of convolutional neural networks (CNNs), worth exploiting for CNN accelerators. However, the extra processing involved comes with hardware overhead, resulting in only marginal profits for most architectures. Meanwhile, systolic arrays have become increasingly compet...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on very large scale integration (VLSI) systems 2023-04, Vol.31 (4), p.1-14
Hauptverfasser:	Sun, Wenhao, Liu, Deng, Zou, Zhiwei, Sun, Wendi, Chen, Song, Kang, Yi
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerators Artificial neural networks Clustering Co-design Computer architecture Convolutional neural network (CNN) Convolutional neural networks Dynamic random access memory Energy consumption Feature maps Field programmable gate arrays Hardware hardware accelerator Kernel Performance degradation Random access memory Sparsity Sun systolic array Systolic arrays weight pruning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Sparsity is an intrinsic property of convolutional neural networks (CNNs), worth exploiting for CNN accelerators. However, the extra processing involved comes with hardware overhead, resulting in only marginal profits for most architectures. Meanwhile, systolic arrays have become increasingly competitive on CNN acceleration for its high spatiotemporal locality and low hardware overhead. However, the irregularity of sparsity induces imbalanced workloads under the rigid systolic dataflow, causing performance degradation. Thus, this article proposed a systolic-array-based architecture, called Sense, for sparse CNN acceleration by model-hardware codesign, enabling large performance gains. To balance input feature map (IFM) and weight loads across the processing element (PE) array, we applied channel clustering to gather IFMs with approximate sparsity for array computation and codesigned a load-balancing weight pruning method to keep the sparsity ratio of each kernel at a certain value with little accuracy loss, improving PE utilization and overall performance. In addition, adaptive dataflow configuration was applied to determine the computing strategy based on the storage ratio of IFMs and weights, lowering 1.17\times - 1.8\times dynamic random access memory (DRAM) access compared with Swallow and further reducing system energy consumption. The whole design was implemented on ZynqZCU102 with 200 MHz and performs at 471, 34, 53, and 191 image/s for AlexNet, VGG-16, ResNet-50, and GoogleNet, respectively. Compared with sparse systolic-array-based accelerators, Swallow, fusion-enabled systolic architecture (FESA), and SPOTS, Sense achieves 0.97\times - 2.18\times , 1.3\times - 1.67\times , and 0.94\times - 1.82\times energy efficiency (image/J) on these CNNs, respectively.
ISSN:	1063-8210 1557-9999
DOI:	10.1109/TVLSI.2023.3241933