Multiplication Through a Single Look-Up-Table (LUT) in CNN Inference Computation

Parameter quantization with lower bit width is the common approach to reduce the computation loads in CNN inference. With the parameters being replaced by fixed-width binaries, multiplication operations can be replaced by the look-up-table (LUT), where the multiplier-multiplicand operands serve as t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computer-aided design of integrated circuits and systems 2022-06, Vol.41 (6), p.1916-1928
Hauptverfasser: Xu, Shiyu, Wang, Qi, Wang, Xingbo, Wang, Shihang, Ye, Terry Tao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Parameter quantization with lower bit width is the common approach to reduce the computation loads in CNN inference. With the parameters being replaced by fixed-width binaries, multiplication operations can be replaced by the look-up-table (LUT), where the multiplier-multiplicand operands serve as the table index, and the precalculated products serve as table elements. Because the histogram profiles of the parameters in different layers/channels differ significantly in CNN, previous LUT-based computation methods have to use different LUTs for each layer/channel, and consequently demand larger memory space along with extra access time and power consumption. In this work, we first normalize the parameters Gaussian profiles of different layers/channels to have similar means and variances, and further quantize the normalized parameters into fixed width through nonlinear quantization. Because of the normalized parameters profile, we can use one single compact LUT ( 16\times 16 entries) to replace all multiplication operations in the whole network. Furthermore, the normalization procedure also reduces the errors induced from quantization. Experiments demonstrate that with a compact 256-entry LUT, we can achieve the accuracy comparable to the results from 32-bit floating-point calculation; while significantly reducing the computation loads and memory spaces, along with power consumption and hardware resources.
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2021.3095825