DReP: Deep ReLU pruning for fast private inference

With increasing concerns about privacy issues in deep learning, privacy-preserving neural network inference has been receiving growing attention from the community, but the implementation lacks practicality due to high latency and high cost. It is suited for latency-efficient private inference (PI)...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of systems architecture 2024-07, Vol.152, p.103156, Article 103156
Hauptverfasser:	Hu, Peng, Sun, Lei, Hu, Cuiyun, Dai, Leyu, Guo, Song, Yu, Miao
Format:	Artikel
Sprache:	eng
Schlagworte:	Neural network Non-retraining and non-redesign Private inference ReLU pruning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	With increasing concerns about privacy issues in deep learning, privacy-preserving neural network inference has been receiving growing attention from the community, but the implementation lacks practicality due to high latency and high cost. It is suited for latency-efficient private inference (PI) to reduce ReLU in neural networks. Existing methods of ReLU reduction for efficient PI usually disregard the benefits of pre-trained models and may introduce more complexities in the era of large models. In this paper, we propose a novel method called DReP, which leverages the output states of neurons to perform deep pruning of ReLU in neural networks for Non-Retraining and Non-Redesign (NTND) scenarios. DReP is based on the identify of a consistent correlation between the average neuronal output states and the importance of ReLUs, and enables deep pruning of ReLU in addition to structured ReLU pre-pruning methods. Notably, DReP does not require modification of the network model structure or extensive retraining, aligning with the requirements of NTND applications. Experimental results demonstrate that our method reduces the number of ReLUs in the original network by 16.66× while maintaining network accuracy. Moreover, when compared with state-of-the-art NTND methods, our approach achieves a pruning rate improvement of 1.57×∼2.32× while preserving comparable privacy inference accuracy.
ISSN:	1383-7621 1873-6165
DOI:	10.1016/j.sysarc.2024.103156