Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery
We propose a novel algorithm for combined unit/filter and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply. Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit/filter pruning a...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose a novel algorithm for combined unit/filter and layer pruning of
deep neural networks that functions during training and without requiring a
pre-trained network to apply. Our algorithm optimally trades-off learning
accuracy and pruning levels while balancing layer vs. unit/filter pruning and
computational vs. parameter complexity using only three user-defined
parameters, which are easy to interpret and tune. The optimal network structure
is found as the solution of a stochastic optimization problem over the network
weights and the parameters of variational Bernoulli distributions for 0/1
Random Variables scaling the units and layers of the network. Pruning occurs
when a variational parameter converges to 0 rendering the corresponding
structure permanently inactive, thus saving computations during training and
prediction. A key contribution of our approach is to define a cost function
that combines the objectives of prediction accuracy and network pruning in a
computational/parameter complexity-aware manner and the automatic selection of
the many regularization parameters. We show that the solutions of the
optimization problem to which the algorithm converges are deterministic
networks. We analyze the ODE system that underlies our stochastic optimization
algorithm and establish domains of attraction around zero for the dynamics of
the network parameters. These results provide theoretical support for safely
pruning units/filters and/or layers during training and lead to practical
pruning conditions. We evaluate our method on the CIFAR-10/100 and ImageNet
datasets using ResNet architectures and demonstrate that our method improves
upon layer only or unit only pruning and favorably competes with combined
unit/filter and layer pruning algorithms requiring pre-trained networks with
respect to pruning ratios and test accuracy. |
---|---|
DOI: | 10.48550/arxiv.2411.09127 |