A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute

Large-scale matrix-vector multiplications, which dominate in deep neural networks (DNNs), are limited by data movement in modern VLSI technologies. This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of solid-state circuits 2019-06, Vol.54 (6), p.1789-1799
Hauptverfasser:	Valavi, Hossein, Ramadge, Peter J., Nestler, Eric, Verma, Naveen
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Capacitors Charge-domain compute CMOS Computation Computer architecture Computer memory Convolution deep learning Digital computers hardware accelerators in-memory computing Integrated circuits Mathematical analysis Matrix algebra Matrix methods Metal oxides Method of moments Microprocessors Neural networks Neurons Random access memory Signal to noise ratio Weight
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Large-scale matrix-vector multiplications, which dominate in deep neural networks (DNNs), are limited by data movement in modern VLSI technologies. This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability. The architecture supports analog/binary input activation (IA)/weight first layer (FL) and binary/binary IA/weight hidden layers (HLs), with batch normalization and input-output (IO) (buffering) circuitry to enable cascading, if desired, for realizing different DNN layers. The architecture is arranged as 8\times 8=64 in-memory-computing neuron tiles, supporting up to 512, 3\times 3\times 512 -input HL neurons and 64, 3\times 3\times 3 -input FL neurons, configurable via tile-level clock gating. In-memory computing is achieved using an 8T bit cell with overlaying metal-oxide-metal (MOM) capacitor, yielding a structure having 1.8\times the area of a standard 6T bit cell. Implemented in 65-nm CMOS, the design achieves HLs/FL energy efficiency of 866/1.25 TOPS/W and throughput of 18876/43.2 GOPS (1498/3.43 GOPS/mm 2 ), when implementing convolution layers; and 658/0.95 TOPS/W, 9438/10.47 GOPS (749/0.83 GOPS/mm 2 ), when implementing convolution followed by batch normalization layers. Several large-scale neural networks are demonstrated, showing performance on standard benchmarks (MNIST, CIFAR-10, and SVHN) equivalent to ideal digital computing.
ISSN:	0018-9200 1558-173X
DOI:	10.1109/JSSC.2019.2899730