A 2.9-33.0 TOPS/W Reconfigurable 1-D/2-D Compute-Near-Memory Inference Accelerator in 10-nm FinFET CMOS

A 10-nm compute-near-memory (CNM) accelerator augments SRAM with multiply accumulate (MAC) units to reduce interconnect energy and achieve 2.9 8b-TOPS/W for matrix-vector computation. The CNM provides high memory bandwidth by accessing SRAM subarrays to enable low-latency, real-time inference in ful...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE solid-state circuits letters 2020, Vol.3, p.118-121
Hauptverfasser: Sumbul, H. Ekin, Chen, Gregory K., Knag, Phil C., Kumar, Raghavan, Anders, Mark A., Kaul, Himanshu, Hsu, Steven K., Agarwal, Amit, Kar, Monodeep, Kim, Seongjong, Krishnamurthy, Ram K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A 10-nm compute-near-memory (CNM) accelerator augments SRAM with multiply accumulate (MAC) units to reduce interconnect energy and achieve 2.9 8b-TOPS/W for matrix-vector computation. The CNM provides high memory bandwidth by accessing SRAM subarrays to enable low-latency, real-time inference in fully connected and recurrent neural networks with small mini-batch sizes. For workloads with greater arithmetic intensity, such as large-batch convolutional neural networks, the CNM reconfigures into a 2-D systolic array to amortize memory access energy over a greater number of computations. Variable-precision 8b/4b/2b/1b MACs increase throughput by up to 8\times for binary operations at 33.0 1b-TOPS/W.
ISSN:2573-9603
2573-9603
DOI:10.1109/LSSC.2020.3007185