UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks

We present a novel method for neural network quantization. Our method, named UNIQ , emulates a non-uniform k -quantile quantizer and adapts the model to perform well with quantized weights by injecting noise to the weights at training time. As a by-product of injecting noise to weights, we find that...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on computer systems 2021-06, Vol.37 (1-4), p.1-15
Hauptverfasser:	Baskin, Chaim, Liss, Natan, Schwartz, Eli, Zheltonozhskii, Evgenii, Giryes, Raja, Bronstein, Alex M., Mendelson, Avi
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Complexity Measurement Neural networks Noise
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We present a novel method for neural network quantization. Our method, named UNIQ , emulates a non-uniform k -quantile quantizer and adapts the model to perform well with quantized weights by injecting noise to the weights at training time. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. Our non-uniform quantization approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We further propose a novel complexity metric of number of bit operations performed (BOPs), and we show that this metric has a linear relation with logic utilization and power. We suggest evaluating the trade-off of accuracy vs. complexity (BOPs). The proposed method, when evaluated on ResNet18/34/50 and MobileNet on ImageNet, outperforms the prior state of the art both in the low-complexity regime and the high accuracy regime. We demonstrate the practical applicability of this approach, by implementing our non-uniformly quantized CNN on FPGA.
ISSN:	0734-2071 1557-7333
DOI:	10.1145/3444943