TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization

Computing-in-memory (CIM) is an attractive approach for energy-efficient deep neural network (DNN) processing, especially for low-power edge devices. However, today's typical DNNs usually exceed CIM-static random access memory (SRAM) capacity. The introduced off-chip communication covers up the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of solid-state circuits 2023-03, Vol.58 (3), p.852-866
Hauptverfasser:	Guo, Ruiqi, Yue, Zhiheng, Si, Xin, Li, Hao, Hu, Te, Tang, Limei, Wang, Yabing, Sun, Hao, Liu, Leibo, Chang, Meng-Fan, Li, Qiang, Wei, Shaojun, Yin, Shouyi
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Common Information Model (computing) Computation Computing-in-memory (CIM) Decomposition deep neural network (DNN) processor Energy efficiency Lookup tables Mathematical analysis Matrix decomposition Measurement Microprocessors Optimization Optimization techniques Power consumption Power management quantization Quantization (signal) Random access memory Shape Sparsity Static random access memory tensor-train decomposition (TTD) Tensors weight compression
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Computing-in-memory (CIM) is an attractive approach for energy-efficient deep neural network (DNN) processing, especially for low-power edge devices. However, today's typical DNNs usually exceed CIM-static random access memory (SRAM) capacity. The introduced off-chip communication covers up the benefits of CIM technique, meaning that CIM processors still encounter the memory bottleneck. To eliminate this bottleneck, we propose a CIM processor, called TT@CIM, which applies the tensor-train decomposition (TTD) method to compress the entire DNN to fit within CIM-SRAM. However, the cost of storage reduction by TTD is to introduce multiple serial small-size matrix multiplications, resulting in massive inefficient multiply-and-accumulate (MAC) and quantization operations (QuantOps). To achieve high energy efficiency, three optimization techniques are proposed in TT@CIM. First, TTD-CIM-matched dataflow is proposed to maximize CIM utilization and minimize additional MAC operations. Second, a bit-level-sparsity-optimized CIM macro with high bit-level-sparsity encoding scheme is designed to reduce the power consumption of one MAC operation. Third, a variable precision quantization method and a lookup table-based quantization unit are presented to improve the performance and energy efficiency of QuantOp. Fabricated in 28-nm CMOS and tested on 4/8-bit decomposed DNNs, TT@CIM achieves 5.99-to-691.13-TOPS/W peak energy efficiency depending on the operating voltage.
ISSN:	0018-9200 1558-173X
DOI:	10.1109/JSSC.2022.3198413