COMPRESSING NEURAL NETWORKS THROUGH UNBIASED MINIMUM VARIANCE PRUNING

A DNN can be compressed by pruning one or more tensors for a deep learning operation. A first pruning parameter and a second pruning parameter are determined for a tensor. A vector having a size of the second pruning parameter may be extracted from the tensor. Pruning probabilities may be determined...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Banner, Ron, Chmiel, Brian, Hubara, Itay
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A DNN can be compressed by pruning one or more tensors for a deep learning operation. A first pruning parameter and a second pruning parameter are determined for a tensor. A vector having a size of the second pruning parameter may be extracted from the tensor. Pruning probabilities may be determined for the elements in the vector. One or more elements in the vector are selected based on the pruning probabilities. Alternatively, a matrix, in lieu of the vector, may be extracted from the tensor. Pruning probabilities may be determined for the columns in the matrix. One or more columns are selected based on their pruning probabilities. The number of the selected element(s) or column(s) may equal the first pruning parameter. The tensor can be modified by modifying the value(s) of the selected element(s) or column(s) and setting the value(s) of one or more unselected elements or columns to zero.