High-Performance Mixed-Low-Precision CNN Inference Accelerator on FPGA

Low-precision techniques can effectively reduce the computational complexity and bandwidth requirements of a convolutional neural network (CNN) inference, but may lead to significant accuracy degradation. Mixed-low-precision techniques provide a superior approach for CNN inference since it can take...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE MICRO 2021-07, Vol.41 (4), p.31-38
Hauptverfasser: Wang, Junbin, Fang, Shaoxia, Wang, Xi, Ma, Jiangsha, Wang, Taobo, Shan, Yi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Low-precision techniques can effectively reduce the computational complexity and bandwidth requirements of a convolutional neural network (CNN) inference, but may lead to significant accuracy degradation. Mixed-low-precision techniques provide a superior approach for CNN inference since it can take the advantages of low precision while maintaining accuracy. In this article, we propose a high-performance, highly flexible {W^8A^8}W8A8 (INT8 weight and INT8 activation) and {W^T A^2}WTA2 (TERNARY weight and INT2 activation) mixed-precision CNN inference hardware architecture, DPUmxp, designed and implemented on Xilinx Virtex UltraScale+13P FPGA with peak performance up to 58.9 TOPS.
ISSN:0272-1732
1937-4143
DOI:10.1109/MM.2021.3081735