MXQN:Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks

Quantization, which involves bit-width reduction, is considered as one of the most effective approaches to rapidly and energy-efficiently deploy deep convolutional neural networks (DCNNs) on resource-constrained embedded hardware. However, bit-width reduction on the weights and activations of DCNNs...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2021-07, Vol.51 (7), p.4561-4574
Hauptverfasser:	Huang, Chenglong, Liu, Puguang, Fang, Liang
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Artificial Intelligence Artificial neural networks Computer Science Hardware Machines Manufacturing Measurement Mechanical Engineering Model accuracy Neural networks Parameter estimation Processes Signal processing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Quantization, which involves bit-width reduction, is considered as one of the most effective approaches to rapidly and energy-efficiently deploy deep convolutional neural networks (DCNNs) on resource-constrained embedded hardware. However, bit-width reduction on the weights and activations of DCNNs seriously degrades accuracy. To solve this problem, in this paper we propose a mixed hardware-friendly quantization (MXQN) method that applies fixed-point quantization and logarithmic quantization for DCNNs without the necessity to retrain and fine-tune the DCNN. Our MXQN algorithm is a multi-staged process where, first, we employ a signal-to-quantization-noise ratio (SQNR) process as the metric to estimate the interplay between the parameter quantization errors of each layer and the overall model prediction accuracy. Then, we utilize a fixed-point quantization process to quantize weights, and depending on the SQNR metric we empirically select either a logarithmic or a fixed-point quantization process to quantize activations. For improved accuracy, we propose an optimized logarithmic quantization scheme that affords a fine-grained step size. We evaluate the performance of MXQN utilizing the VGG16 network on the MNIST, CIFAR-10, CIFAR-100, and the ImageNet datasets, as well as VGG19 and ResNet (ResNet18, ResNet34, ResNet50) networks on the ImageNet, and demonstrate that the MXQN-quantized DCNN despite not being retrained and fine-tuned, it still achieves high accuracy close to the original DCNN.
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-020-02109-0