Multiplication Through a Single Look-Up-Table (LUT) in CNN Inference Computation
Parameter quantization with lower bit width is the common approach to reduce the computation loads in CNN inference. With the parameters being replaced by fixed-width binaries, multiplication operations can be replaced by the look-up-table (LUT), where the multiplier-multiplicand operands serve as t...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on computer-aided design of integrated circuits and systems 2022-06, Vol.41 (6), p.1916-1928 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Parameter quantization with lower bit width is the common approach to reduce the computation loads in CNN inference. With the parameters being replaced by fixed-width binaries, multiplication operations can be replaced by the look-up-table (LUT), where the multiplier-multiplicand operands serve as the table index, and the precalculated products serve as table elements. Because the histogram profiles of the parameters in different layers/channels differ significantly in CNN, previous LUT-based computation methods have to use different LUTs for each layer/channel, and consequently demand larger memory space along with extra access time and power consumption. In this work, we first normalize the parameters Gaussian profiles of different layers/channels to have similar means and variances, and further quantize the normalized parameters into fixed width through nonlinear quantization. Because of the normalized parameters profile, we can use one single compact LUT ( 16\times 16 entries) to replace all multiplication operations in the whole network. Furthermore, the normalization procedure also reduces the errors induced from quantization. Experiments demonstrate that with a compact 256-entry LUT, we can achieve the accuracy comparable to the results from 32-bit floating-point calculation; while significantly reducing the computation loads and memory spaces, along with power consumption and hardware resources. |
---|---|
ISSN: | 0278-0070 1937-4151 |
DOI: | 10.1109/TCAD.2021.3095825 |