MIXING SPARSITY COMPRESSION

Techniques for compressing a neural network model by mixing compression ratios (sparsity patterns) are described. The weight tensor of a neural network model is divided into weight groups. The pruning cost of compressing the weight values according to a compression ratio is determined for each weigh...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: HUANG, Randy Renfu, MEYER, Paul Gilbert, PEKHIMENKO, Gennady, TAN, Xiaodan
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Techniques for compressing a neural network model by mixing compression ratios (sparsity patterns) are described. The weight tensor of a neural network model is divided into weight groups. The pruning cost of compressing the weight values according to a compression ratio is determined for each weight group, and a pruning cost distribution for the compression ratio is generated from the pruning costs of the weight groups. A cost threshold can then be selected from the pruning cost distribution, and weight groups having a pruning cost below the selected cost threshold are compressed according to the compression ratio. The remaining weight groups can be compressed using one or more less aggressive compression ratios. The cost threshold can be adjusted to tune the overall sparsity and accuracy of the compressed neural network.