SUN: Dynamic Hybrid-Precision SRAM-Based CIM Accelerator With High Macro Utilization Using Structured Pruning Mixed-Precision Networks

Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems 2024-07, Vol.43 (7), p.2163-2176
Hauptverfasser:	Chen, Yen-Wen, Wang, Rui-Hsuan, Cheng, Yu-Hsiang, Lu, Chih-Cheng, Chang, Meng-Fan, Tang, Kea-Tiong
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive filters Algorithms Artificial intelligence Artificial neural networks Co-design Common Information Model (computing) Compression algorithms Computational modeling Computer architecture computing-in-memory (CIM) Convolutional neural networks deep learning Efficiency Hardware Machine learning Memory management quantization Quantization (signal) Random access memory Static random access memory Utilization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM, quantization and pruning are indispensable for reducing the calculation complexity and improving the efficiency of hardware calculations. Mixed-precision quantization with flexible bit widths provides a better efficiency-accuracy tradeoff than fixed-precision quantization. However, CIM calculations for mixed-precision models are inefficient because the fixed capacity of CIM macros is redundant for hybrid precision distributions. To address this, we propose a software and hardware co-design static random-access memory (SRAM)-based CIM architecture called SUN, including a CIM-adaptive mixed precision joint pruning quantization algorithm and dynamic hybrid precision CNN accelerator. Three techniques are implemented in this architecture: 1) a mixed precision joint pruning algorithm for reducing the memory access and removing the redundant computing; 2) a CIM-adaptive filter-wise and paired mixed-precision quantization for improving CIM macro utilization; and 3) an SRAM-based CIM CNN accelerator in which the SRAM CIM macro is used as the processing element to support sparse and mixed-precision CNN computation with high CIM macro utilization. This architecture achieves a system area efficiency of 428.2 TOPS/mm 2 and throughput of 792.2 GOPS on the CIFAR-10 dataset.
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2024.3358583