Universal Deep Neural Network Compression

We consider compression of deep neural networks (DNNs) by weight quantization and lossless source coding for memory-efficient deployment. Whereas the previous work addressed non-universal scalar quantization and entropy source coding, we for the first time introduce universal DNN compression by univ...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of selected topics in signal processing 2020-05, Vol.14 (4), p.715-726
Hauptverfasser:	Choi, Yoojin, El-Khamy, Mostafa, Lee, Jungwon
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Codes Coding Compression ratio Deep neural networks Dithering Entropy entropy coded vector quantization Image coding Lattices lossy compression Neural networks Source coding Time compression universal compression universal quantization Vector quantization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We consider compression of deep neural networks (DNNs) by weight quantization and lossless source coding for memory-efficient deployment. Whereas the previous work addressed non-universal scalar quantization and entropy source coding, we for the first time introduce universal DNN compression by universal vector quantization and universal source coding. In particular, the proposed scheme utilizes universal lattice quantization, which randomizes the source by uniform random dithering before lattice quantization and can perform near-optimally on any source without relying on knowledge of the source distribution. Moreover, we present a method of fine-tuning vector quantized DNNs to recover any accuracy loss due to quantization. From our experiments, we show that the proposed scheme compresses the MobileNet and ShuffleNet models trained on ImageNet with the state-of-the-art compression ratios of 10.7 and 8.8, respectively.
ISSN:	1932-4553 1941-0484
DOI:	10.1109/JSTSP.2020.2975903