A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices

The development of small, energy-efficient artificial intelligence edge devices is limited in conventional computing architectures by the need to transfer data between the processor and memory. Non-volatile compute-in-memory (nvCIM) architectures have the potential to overcome such issues, but the d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Nature electronics 2021-01, Vol.4 (1), p.81-90
Hauptverfasser:	Xue, Cheng-Xin, Chiu, Yen-Cheng, Liu, Ta-Wei, Huang, Tsung-Yuan, Liu, Je-Syu, Chang, Ting-Wei, Kao, Hui-Yao, Wang, Jing-Hong, Wei, Shih-Ying, Lee, Chun-Ying, Huang, Sheng-Po, Hung, Je-Min, Teng, Shih-Hsih, Wei, Wei-Chen, Chen, Yi-Ren, Hsu, Tzu-Hsiang, Chen, Yen-Kai, Lo, Yun-Chen, Wen, Tai-Hsing, Lo, Chung-Chuan, Liu, Ren-Shuo, Hsieh, Chih-Cheng, Tang, Kea-Tiong, Ho, Mon-Shu, Su, Chin-Yi, Chou, Chung-Cheng, Chih, Yu-Der, Chang, Meng-Fan
Format:	Artikel
Sprache:	eng
Schlagworte:	639/166 639/166/987 Electrical Engineering Engineering
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The development of small, energy-efficient artificial intelligence edge devices is limited in conventional computing architectures by the need to transfer data between the processor and memory. Non-volatile compute-in-memory (nvCIM) architectures have the potential to overcome such issues, but the development of high-bit-precision configurations required for dot-product operations remains challenging. In particular, input–output parallelism and cell-area limitations, as well as signal margin degradation, computing latency in multibit analogue readout operations and manufacturing challenges, still need to be addressed. Here we report a 2 Mb nvCIM macro (which combines memory cells and related peripheral circuitry) that is based on single-level cell resistive random-access memory devices and is fabricated in a 22 nm complementary metal–oxide–semiconductor foundry process. Compared with previous nvCIM schemes, our macro can perform multibit dot-product operations with increased input–output parallelism, reduced cell-array area, improved accuracy, and reduced computing latency and energy consumption. The macro can, in particular, achieve latencies between 9.2 and 18.3 ns, and energy efficiencies between 146.21 and 36.61 tera-operations per second per watt, for binary and multibit input–weight–output configurations, respectively. Commercial complementary metal–oxide–semiconductor and resistive random-access memory technologies can be used to create multibit compute-in-memory circuits capable of fast and energy-efficient inference for use in small artificial intelligence edge devices.
ISSN:	2520-1131 2520-1131
DOI:	10.1038/s41928-020-00505-5