COMPRESSING NEURAL NETWORKS THROUGH UNBIASED MINIMUM VARIANCE PRUNING
A DNN can be compressed by pruning one or more tensors for a deep learning operation. A first pruning parameter and a second pruning parameter are determined for a tensor. A vector having a size of the second pruning parameter may be extracted from the tensor. Pruning probabilities may be determined...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A DNN can be compressed by pruning one or more tensors for a deep learning operation. A first pruning parameter and a second pruning parameter are determined for a tensor. A vector having a size of the second pruning parameter may be extracted from the tensor. Pruning probabilities may be determined for the elements in the vector. One or more elements in the vector are selected based on the pruning probabilities. Alternatively, a matrix, in lieu of the vector, may be extracted from the tensor. Pruning probabilities may be determined for the columns in the matrix. One or more columns are selected based on their pruning probabilities. The number of the selected element(s) or column(s) may equal the first pruning parameter. The tensor can be modified by modifying the value(s) of the selected element(s) or column(s) and setting the value(s) of one or more unselected elements or columns to zero. |
---|