A Heterogeneous Microprocessor Based on All-Digital Compute-in-Memory for End-to-End AIoT Inference

Deploying neural network (NN) models on Internet-of-Things (IoT) devices is important to enable artificial intelligence (AI) on the edge realizing AI-of-Things (AIoT). However, high energy consumption and bandwidth requirement of NN models restricts AI applications on battery-limited equipments. Com...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems. II, Express briefs Express briefs, 2023-08, Vol.70 (8), p.3099-3103
Hauptverfasser:	Yu, Songming, He, Yifan, Jia, Hongyang, Sun, Wenyu, Zhou, Mufeng, Lei, Luchang, Zhao, Wentao, Ma, Guofu, Yang, Huazhong, Liu, Yongpan
Format:	Artikel
Sprache:	eng
Schlagworte:	accelerator AIoT Artificial intelligence Artificial neural networks Bus interconnections Central Processing Unit Common Information Model (computing) Compute-in-memory Computer architecture Data acquisition Energy consumption Energy efficiency Inference Internet of Things Microprocessors Neural networks Optimization Random access memory RISC-V Sense amplifiers System on chip Task analysis TinyML
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Deploying neural network (NN) models on Internet-of-Things (IoT) devices is important to enable artificial intelligence (AI) on the edge realizing AI-of-Things (AIoT). However, high energy consumption and bandwidth requirement of NN models restricts AI applications on battery-limited equipments. Compute-In-Memory (CIM), featured with high energy efficiency, provides new opportunities for the IoT deployment of NN. However, the design of CIM-based full system is still at the early stage, lacking system-level demonstration and vertical optimization for running end-to-end AI applications. In this brief, we demonstrate a low-power heterogeneous microprocessor System-on-Chip (SoC) with an all-digital SRAM CIM accelerator and rich data acquisition interfaces for end-to-end AIoT NN inference. A dedicated reconfigurable dataflow controller for CIM computation greatly lowers bandwidth requirement on the system bus and improves execution efficiency. The all-digital SRAM CIM array embeds NAND-based bit-serial multiplication within the readout sense amplifier balancing the storage density and system-level throughput. Our chip achieves a throughput of 12.8 GOPS, with 10 TOPS/W energy efficiency. Benchmarked by the four tasks in MLPerf Tiny, experimental results show 1.8x to 2.9x inference speedup over a baseline CIM processor.
ISSN:	1549-7747 1558-3791
DOI:	10.1109/TCSII.2023.3249245