A four-megabit compute-in-memory macro with eight-bit precision based on CMOS and resistive random-access memory for AI edge devices

Non-volatile computing-in-memory (nvCIM) architecture can reduce the latency and energy consumption of artificial intelligence computation by minimizing the movement of data between the processor and memory. However, artificial intelligence edge devices with high inference accuracy require large-cap...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature electronics 2021-12, Vol.4 (12), p.921-930
Hauptverfasser: Hung, Je-Min, Xue, Cheng-Xin, Kao, Hui-Yao, Huang, Yen-Hsiang, Chang, Fu-Chun, Huang, Sheng-Po, Liu, Ta-Wei, Jhang, Chuan-Jia, Su, Chin-I, Khwa, Win-San, Lo, Chung-Chuan, Liu, Ren-Shuo, Hsieh, Chih-Cheng, Tang, Kea-Tiong, Ho, Mon-Shu, Chou, Chung-Cheng, Chih, Yu-Der, Chang, Tsung-Yung Jonathan, Chang, Meng-Fan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Non-volatile computing-in-memory (nvCIM) architecture can reduce the latency and energy consumption of artificial intelligence computation by minimizing the movement of data between the processor and memory. However, artificial intelligence edge devices with high inference accuracy require large-capacity nvCIM macros capable of high-bit-precision dot-product operations. Here we report a four-megabit nvCIM macro that combines memory cells with peripheral circuitry and is based on 22-nm-foundry binary resistive random-access memory devices and complementary metal–oxide–semiconductor (CMOS) processes. The fully CMOS-integrated macro features an asymmetrically modulated input-and-calibration scheme, a calibrated-and-weighted current-to-voltage stacking read scheme, and input-shaping hardware to overcome the challenges involved in designing large-capacity nvCIM macros with high bit precision. The macro offers latencies between 5.2 and 15.2 ns and energy efficiency between 194.4 and 15.6 tera-operations per second per watt in binary to 8-bit-input–8-bit-weight dot-product operations. Advanced complementary metal–oxide–semiconductor technology and resistive random-access memory can be used to create high-bit-precision compute-in-memory macros for low latency and efficient edge computing.
ISSN:2520-1131
2520-1131
DOI:10.1038/s41928-021-00676-9