Neural network model compression method and device, acceleration unit and computing system
The embodiment of the invention discloses a compression method of a neural network model. The compression method comprises the following steps: acquiring a weight matrix of the neural network model; on the basis of rows or columns, dividing the weight matrix into a plurality of weight sets, wherein...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The embodiment of the invention discloses a compression method of a neural network model. The compression method comprises the following steps: acquiring a weight matrix of the neural network model; on the basis of rows or columns, dividing the weight matrix into a plurality of weight sets, wherein each weight set comprises a plurality of weight values, the data length of each weight set is determined on the basis of the bit width of an acceleration unit, and the acceleration unit is used for executing operation related to a neural network model; training the neural network model by adopting a weight group-based sparsification algorithm; carrying out weight group-based pruning on the trained weight matrix to obtain a sparse weight matrix; and storing the sparse weight matrix according to a predetermined storage format. The embodiment of the invention further discloses a corresponding neural network model compression device, a neural network model operation method, an acceleration unit, a calculation system, a |
---|