PruneAug: Bridging DNN Pruning and Inference Latency on Diverse Sparse Platforms Using Automatic Layerwise Block Pruning

Although pruning is an effective technique to reduce the number of weights in deep neural networks (DNNs), it remains challenging for the resulting sparse networks to perform low-latency inference on everyday hardware. This problem is mainly caused by the incompatibility between the unstructured spa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computers 2024-11, Vol.73 (11), p.2576-2589
Hauptverfasser:	Geng, Hanfei, Liu, Yifei, Zheng, Yujie, Zhang, Li Lyna, Sun, Jingwei, Wang, Yujing, Wang, Yang, Sun, Guangzhong, Yang, Mao, Cao, Ting, Liu, Yunxin
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Artificial neural networks block pruning Deep neural network Hardware Optimization Shape sparse kernels Sparse matrices Sun
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Although pruning is an effective technique to reduce the number of weights in deep neural networks (DNNs), it remains challenging for the resulting sparse networks to perform low-latency inference on everyday hardware. This problem is mainly caused by the incompatibility between the unstructured sparsity adopted for accuracy preservation and the sparse platform's (the combination of sparse kernel library and the underlying hardware) expectation of regular sparse patterns. In order to resolve this conflict, we propose PruneAug, an augmentation over existing unstructured pruning methods that finds block-sparse networks with much lower latency but preserves the accuracy. The fundamental idea of PruneAug is to prune the network with a layerwise block dimension assignment in a platform-aware fashion. Subject to an accuracy-loss constraint, PruneAug minimizes the latency of the block sparse network by jointly optimizing this layerwise block dimension assignment and the network's sparsity level. Admittedly, this approach expands the solution space. To curb our search cost, we include multiple optimizations while designing PruneAug's search space and strategy. Our evaluation over diverse pruning methods, DNNs, datasets, and sparse platforms shows that PruneAug enables different pruning methods to achieve speedup (as much as \boldsymbol{\sim}13\boldsymbol{\times} ∼13× depending on the platform) while maintaining competitive accuracy relative to unstructured sparsity, extracting the full potential of sparse platforms.
ISSN:	0018-9340 1557-9956
DOI:	10.1109/TC.2024.3441855