A Heterogeneous Microprocessor Based on All-Digital Compute-in-Memory for End-to-End AIoT Inference
Deploying neural network (NN) models on Internet-of-Things (IoT) devices is important to enable artificial intelligence (AI) on the edge realizing AI-of-Things (AIoT). However, high energy consumption and bandwidth requirement of NN models restricts AI applications on battery-limited equipments. Com...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems. II, Express briefs Express briefs, 2023-08, Vol.70 (8), p.3099-3103 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deploying neural network (NN) models on Internet-of-Things (IoT) devices is important to enable artificial intelligence (AI) on the edge realizing AI-of-Things (AIoT). However, high energy consumption and bandwidth requirement of NN models restricts AI applications on battery-limited equipments. Compute-In-Memory (CIM), featured with high energy efficiency, provides new opportunities for the IoT deployment of NN. However, the design of CIM-based full system is still at the early stage, lacking system-level demonstration and vertical optimization for running end-to-end AI applications. In this brief, we demonstrate a low-power heterogeneous microprocessor System-on-Chip (SoC) with an all-digital SRAM CIM accelerator and rich data acquisition interfaces for end-to-end AIoT NN inference. A dedicated reconfigurable dataflow controller for CIM computation greatly lowers bandwidth requirement on the system bus and improves execution efficiency. The all-digital SRAM CIM array embeds NAND-based bit-serial multiplication within the readout sense amplifier balancing the storage density and system-level throughput. Our chip achieves a throughput of 12.8 GOPS, with 10 TOPS/W energy efficiency. Benchmarked by the four tasks in MLPerf Tiny, experimental results show 1.8x to 2.9x inference speedup over a baseline CIM processor. |
---|---|
ISSN: | 1549-7747 1558-3791 |
DOI: | 10.1109/TCSII.2023.3249245 |