Automated Quantization and Retraining for Neural Network Models Without Labeled Data

Deploying neural network models to edge devices is becoming increasingly popular because such deployment decreases the response time and ensures better data privacy of services. However, running large models on edge devices poses challenges because of limited computing resources and storage space. R...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2022-01, Vol.10, p.73818-73834
Hauptverfasser: Thonglek, Kundjanasith, Takahashi, Keichi, Ichikawa, Kohei, Nakasan, Chawanat, Nakada, Hidemoto, Takano, Ryousei, Leelaprute, Pattara, Iida, Hajimu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Deploying neural network models to edge devices is becoming increasingly popular because such deployment decreases the response time and ensures better data privacy of services. However, running large models on edge devices poses challenges because of limited computing resources and storage space. Researchers have therefore proposed various model compression methods to reduce the model size. To balance the trade-off between model size and accuracy, conventional model compression methods require manual effort to find the optimal configuration that reduces the model size without significant degradation of accuracy. In this article, we propose a method to automatically find the optimal configurations for quantization. The proposed method suggests multiple compression configurations that produce models with different size and accuracy, from which users can select the configurations that suit their use cases. Additionally, we propose a retraining method that does not require any labeled datasets for retraining. We evaluated the proposed method using various neural network models for classification, regression and semantic similarity tasks, and demonstrated that the proposed method reduced the size of models by at least 30% while maintaining less than 1% loss of accuracy. We compared the proposed method with state-of-the-art automated compression methods, and showed that it can provide better compression configurations than existing methods.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2022.3190627