Simulating quantized inference on convolutional neural networks

Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers & electrical engineering 2021-10, Vol.95, p.107446, Article 107446
Hauptverfasser:	Finotti, Vitor, Albertini, Bruno
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Classification Complexity Convolutional neural networks Fixed point arithmetic Inference Measurement Neural networks Post-training quantization Size reduction Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship between classification complexity and tolerance to quantization, finding an inverse correlation between quantization level and dataset complexity. •Simulation of fixed-point quantization inference in convolutional neural networks.•Quantization of convolutional neural network inference in PyTorch.•Model size reduction on ImageNet architectures by more than 50% without accuracy loss.•CNN architectures classifying simple datasets are more tolerant to quantization.
ISSN:	0045-7906 1879-0755
DOI:	10.1016/j.compeleceng.2021.107446