RV-SCNN: A RISC-V Processor With Customized Instruction Set for SNN and CNN Inference Acceleration on Edge Platforms

The rapid advancement of artificial intelligence (AI) applications has driven an increasing demand for conducting inference tasks on edge devices. However, implementing computation-intensive neural networks on resource-constrained edge systems remains a significant challenge. In this paper, we propo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems 2024-10, p.1-1
Hauptverfasser:	Wang, Xingbo, Feng, Chenxi, Kang, Xinyu, Wang, Qi, Huang, Yucong, Ye, Terry Tao
Format:	Artikel
Sprache:	eng
Schlagworte:	Acceleration CNN Convolution Convolutional neural networks Edge Computing Hardware Instruction sets Logic Memory management Neurons Process control Registers RISC-V SIMD Single instruction multiple data SNN
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The rapid advancement of artificial intelligence (AI) applications has driven an increasing demand for conducting inference tasks on edge devices. However, implementing computation-intensive neural networks on resource-constrained edge systems remains a significant challenge. In this paper, we propose a novel processor architecture called RV-SCNN to address this challenge. The architecture is based on the RISC-V generic instruction set and incorporates various Single Instruction Multiple Data (SIMD) custom instruction extensions to accelerate the computation of Spike Neural Networks (SNNs) and Convolutional Neural Networks (CNNs), enabling efficient execution of complex neural network models. The core operators of the processor are shared by both SNN and CNN operations, thus supporting both computation modes. Other acceleration implementations include an internal hardware loop control unit that reduces the instruction overhead, an address calculation unit and an inter-layer fusion unit that minimize the memory access overhead, as well as an Image to Column (IM2COL) unit that improves the computational efficiency of the 3×3 convolutions in SNNs and CNNs. The custom instructions are called through inline assembly in the C program, providing higher flexibility compared to traditional ASICs and supporting custom complex SNN/CNN network structures. Compared to traditional instruction sets, the RV-SCNN processor reduces the execution time of CNNs and SNNs by over 90%. We validate the processor on FPGA platform and evaluate its performance under CMOS 55nm process. The processor achieves an operational efficiency of 9.88 pJ/SOP in SNN network inference tasks, while the peak energy efficiency reaches 679 GOPS/W in CNN network inference.
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2024.3472293