COMPAC: Compressed Time-Domain, Pooling-Aware Convolution CNN Engine With Reduced Data Movement for Energy-Efficient AI Computing
In this work, we demonstrate a compressed time-domain, pooling-aware convolution (COMPAC) convolutional neural network (CNN) engine for energy-efficient edge AI computing by performing multi-bit input and multi-bit weight multiply-and-accumulate (MAC) operations in the time domain. The multi-bit inp...
Gespeichert in:
Veröffentlicht in: | IEEE journal of solid-state circuits 2021-07, Vol.56 (7), p.2205-2220 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this work, we demonstrate a compressed time-domain, pooling-aware convolution (COMPAC) convolutional neural network (CNN) engine for energy-efficient edge AI computing by performing multi-bit input and multi-bit weight multiply-and-accumulate (MAC) operations in the time domain. The multi-bit inputs are compactly represented as a single pulsewidth encoded input. This translates into reduced switching capacitance ( C_{\text {DYN}} ), compared with the baseline digital implementation, and can enable low-power neural network computing in an edge device. COMPAC CNN engine employs a novel and an improved version of the memory delay line (MDL) supporting the time residue scaling to perform the signed accumulation of multi-bit input and multi-bit weight products in the time domain. The compressed time-domain (CTD) approach is proposed to improve the throughput in time encoding of the input activations. The simulation results of the proposed CTD approach on the AlexNet CNN over 1000 ImageNet images show that 14.71 and 7.15 input clock cycles are consumed to time-encode an 8-bit input activation in two different CTD modes, improving the throughput by 88.60% and 94.46%, respectively, compared with the conventional pulsewidth modulation-based time-domain encoding. Furthermore, a pooling-aware convolution (PAC) technique is proposed to reduce the number of redundant MAC computations for the convolution layers that are followed by the max-pooling layer. The simulation results on the AlexNet CNN over 1000 ImageNet images show up to 31.47% (21.79%) reduction in the number of non-zero input activations MACs with a top-five classification accuracy loss of 0.60% (0.90%) with an on-chip access overhead of 60.53% (8.03%) for the PAC modes 1 (2) respectively. Finally, energy-efficient data flow for optimal on-/off-chip memory accesses for the time-domain MAC computation is proposed. COMPAC data flow the results in 86.97% reduced on-chip accesses and 29.74% reduced off-chip accesses compared with the Eyeriss approach, at iso-bit precision. COMPAC CNN engine implemented in 65-nm CMOS test chip demonstrates an energy efficiency of 1.044 TOPS/W and the throughput of 0.1278 GOPS at 720 mV for the AlexNet. The top-five classification accuracy of 76.90% measured over 1000 ImageNet images and 77.15% by simulating over 50 000 ImageNet images is achieved. The simulation results comprehending MDL circuit non-ideali |
---|---|
ISSN: | 0018-9200 1558-173X |
DOI: | 10.1109/JSSC.2020.3041502 |