High-Performance Mixed-Low-Precision CNN Inference Accelerator on FPGA

Low-precision techniques can effectively reduce the computational complexity and bandwidth requirements of a convolutional neural network (CNN) inference, but may lead to significant accuracy degradation. Mixed-low-precision techniques provide a superior approach for CNN inference since it can take...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE MICRO 2021-07, Vol.41 (4), p.31-38
Hauptverfasser:	Wang, Junbin, Fang, Shaoxia, Wang, Xi, Ma, Jiangsha, Wang, Taobo, Shan, Yi
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Buffer storage Computer architecture Convolution Field programmable gate arrays Hardware High performance computing Inference Ports (computers) Precision engineering Quantization (signal) Weight
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Low-precision techniques can effectively reduce the computational complexity and bandwidth requirements of a convolutional neural network (CNN) inference, but may lead to significant accuracy degradation. Mixed-low-precision techniques provide a superior approach for CNN inference since it can take the advantages of low precision while maintaining accuracy. In this article, we propose a high-performance, highly flexible {W^8A^8}W8A8 (INT8 weight and INT8 activation) and {W^T A^2}WTA2 (TERNARY weight and INT2 activation) mixed-precision CNN inference hardware architecture, DPUmxp, designed and implemented on Xilinx Virtex UltraScale+13P FPGA with peak performance up to 58.9 TOPS.
ISSN:	0272-1732 1937-4143
DOI:	10.1109/MM.2021.3081735