High-Performance Mixed-Low-Precision CNN Inference Accelerator on FPGA
Low-precision techniques can effectively reduce the computational complexity and bandwidth requirements of a convolutional neural network (CNN) inference, but may lead to significant accuracy degradation. Mixed-low-precision techniques provide a superior approach for CNN inference since it can take...
Gespeichert in:
Veröffentlicht in: | IEEE MICRO 2021-07, Vol.41 (4), p.31-38 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Low-precision techniques can effectively reduce the computational complexity and bandwidth requirements of a convolutional neural network (CNN) inference, but may lead to significant accuracy degradation. Mixed-low-precision techniques provide a superior approach for CNN inference since it can take the advantages of low precision while maintaining accuracy. In this article, we propose a high-performance, highly flexible {W^8A^8}W8A8 (INT8 weight and INT8 activation) and {W^T A^2}WTA2 (TERNARY weight and INT2 activation) mixed-precision CNN inference hardware architecture, DPUmxp, designed and implemented on Xilinx Virtex UltraScale+13P FPGA with peak performance up to 58.9 TOPS. |
---|---|
ISSN: | 0272-1732 1937-4143 |
DOI: | 10.1109/MM.2021.3081735 |