An Emerging NVM CIM Accelerator With Shared-Path Transpose Read and Bit-Interleaving Weight Storage for Efficient On-Chip Training in Edge Devices

Computing-in-memory (CIM) helps to improve the energy efficiency of computing by reducing data movement. In edge devices, it is necessary for CIM accelerators to support light-weighted on-chip training for adapting the model to environmental changes and ensuring edge data security. However, most of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems. II, Express briefs Express briefs, 2023-07, Vol.70 (7), p.2645-2649
Hauptverfasser:	Guo, Zhiwang, Chen, Deyang, Zhao, Chenyang, Fang, Jinbei, Jiang, Jingwen, Liu, Yixuan, Tian, Haidong, Xiong, Xiankui, Zhou, Keji, Xue, Xiaoyong, Liu, Qi, Zeng, Xiaoyang
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerators Back propagation Cloud computing Computer architecture computing-in-memory Energy efficiency interleaving storage Memory devices Memristor Microprocessors Nonvolatile memory on-chip training Performance evaluation System-on-chip Training Transistors transpose read
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Computing-in-memory (CIM) helps to improve the energy efficiency of computing by reducing data movement. In edge devices, it is necessary for CIM accelerators to support light-weighted on-chip training for adapting the model to environmental changes and ensuring edge data security. However, most of the previous CIM accelerators for edge devices only realize inference but with training performed on cloud. The support for on-chip training will lead to remarkable area cost and serious performance attenuation. In this brief, a CIM accelerator based on emerging nonvolatile memory (NVM) is presented with shared-path transpose read and bit-interleaving weight storage for efficient on-chip training in edge devices. The shared-path transpose read employs a new biasing scheme to eliminate the influence of body effect on the transpose read, improving both read margin and speed. The bit-interleaving weight storage splits the multi-bit weights into individual bits which are stored in the array alternately, speeding up the calculation of training process remarkably. For 8-bit inputs and weights, the evaluation in the 28nm process shows that the proposed accelerator achieves ~3.34/3.06 TOPS/W energy efficiency for feed-forward/ back-propagation, 4.6X lower computing latency, and reduces at least 20% chip size compared to the baseline design.
ISSN:	1549-7747 1558-3791
DOI:	10.1109/TCSII.2023.3240193