Robust Learning of Parsimonious Deep Neural Networks
We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network during the early stages of training. Thus, the computational cost of subsequent training iterations, besides that of inference, is considerably reduced. Our metho...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose a simultaneous learning and pruning algorithm capable of
identifying and eliminating irrelevant structures in a neural network during
the early stages of training. Thus, the computational cost of subsequent
training iterations, besides that of inference, is considerably reduced. Our
method, based on variational inference principles using Gaussian scale mixture
priors on neural network weights, learns the variational posterior distribution
of Bernoulli random variables multiplying the units/filters similarly to
adaptive dropout. Our algorithm, ensures that the Bernoulli parameters
practically converge to either 0 or 1, establishing a deterministic final
network. We analytically derive a novel hyper-prior distribution over the prior
parameters that is crucial for their optimal selection and leads to consistent
pruning levels and prediction accuracy regardless of weight initialization or
the size of the starting network. We prove the convergence properties of our
algorithm establishing theoretical and practical pruning conditions. We
evaluate the proposed algorithm on the MNIST and CIFAR-10 data sets and the
commonly used fully connected and convolutional LeNet and VGG16 architectures.
The simulations show that our method achieves pruning levels on par with
state-of the-art methods for structured pruning, while maintaining better
test-accuracy and more importantly in a manner robust with respect to network
initialization and initial size. |
---|---|
DOI: | 10.48550/arxiv.2205.04650 |