A 617-TOPS/W All-Digital Binary Neural Network Accelerator in 10-nm FinFET CMOS

A binary neural network (BNN) chip explores the limits of energy efficiency and computational density for an all-digital deep neural network (DNN) inference accelerator. The chip intersperses data storage and computation using computation near memory (CNM) to reduce interconnect and data movement co...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal of solid-state circuits 2021-04, Vol.56 (4), p.1082-1092
Hauptverfasser: Knag, Phil C., Chen, Gregory K., Sumbul, H. Ekin, Kumar, Raghavan, Hsu, Steven K., Agarwal, Amit, Kar, Monodeep, Kim, Seongjong, Anders, Mark A., Kaul, Himanshu, Krishnamurthy, Ram K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A binary neural network (BNN) chip explores the limits of energy efficiency and computational density for an all-digital deep neural network (DNN) inference accelerator. The chip intersperses data storage and computation using computation near memory (CNM) to reduce interconnect and data movement costs. It performs wide inner product operations to leverage parallelism inherent in DNN computations. The BNN chip leverages lightweight pipelining at a near-threshold voltage (NTV) to reduce the overhead of sequential elements. It employs optimized data access patterns to reduce memory accesses for convolutional operation with pooling layers. The combination of these techniques enables the BNN chip to achieve a peak energy efficiency of 617 TOPS/W. The digital BNN chip approaches the energy efficiency of analog in-memory techniques while also ensuring deterministic, scalable, and bit-accuracy operation. Moreover, the all-digital design leverages process scaling and does not require additional memory transistors or passive devices to attain a peak compute density of 418 TOPS/mm 2 and a memory density of 414 KB/mm 2 . The binary design is extended to enable bit-serial integer precision operation with a reconfigurable 1-b multiplication circuit and element-wise partial sum shift and accumulate. This technique allows for fine-grain mixed precision and retains energy efficiency by exploiting parallelism inherent in DNNs. The bit-serial binary operation allows for bit-accurate operation and high DNN accuracy that multibit analog compute-in-memory designs struggle to attain. It provides favorable energy tradeoffs compared with small-integer digital DNN accelerators.
ISSN:0018-9200
1558-173X
DOI:10.1109/JSSC.2020.3038616