Neural network model compression method and device, acceleration unit and computing system

The embodiment of the invention discloses a compression method of a neural network model. The compression method comprises the following steps: acquiring a weight matrix of the neural network model; on the basis of rows or columns, dividing the weight matrix into a plurality of weight sets, wherein...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: YAN CHENGYANG, LI YINGMIN, TU XIAOBIN, LAO MAOYUAN, MAO JUNWEI, ZHANG WEIFENG
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The embodiment of the invention discloses a compression method of a neural network model. The compression method comprises the following steps: acquiring a weight matrix of the neural network model; on the basis of rows or columns, dividing the weight matrix into a plurality of weight sets, wherein each weight set comprises a plurality of weight values, the data length of each weight set is determined on the basis of the bit width of an acceleration unit, and the acceleration unit is used for executing operation related to a neural network model; training the neural network model by adopting a weight group-based sparsification algorithm; carrying out weight group-based pruning on the trained weight matrix to obtain a sparse weight matrix; and storing the sparse weight matrix according to a predetermined storage format. The embodiment of the invention further discloses a corresponding neural network model compression device, a neural network model operation method, an acceleration unit, a calculation system, a