The State of Sparsity in Deep Neural Networks
We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet. Across thousands of experiments, we demonstrate that complex techniques (Mol...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We rigorously evaluate three state-of-the-art techniques for inducing
sparsity in deep neural networks on two large-scale learning tasks: Transformer
trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet.
Across thousands of experiments, we demonstrate that complex techniques
(Molchanov et al., 2017; Louizos et al., 2017b) shown to yield high compression
rates on smaller datasets perform inconsistently, and that simple magnitude
pruning approaches achieve comparable or better results. Additionally, we
replicate the experiments performed by (Frankle & Carbin, 2018) and (Liu et
al., 2018) at scale and show that unstructured sparse architectures learned
through pruning cannot be trained from scratch to the same test set performance
as a model trained with joint sparsification and optimization. Together, these
results highlight the need for large-scale benchmarks in the field of model
compression. We open-source our code, top performing model checkpoints, and
results of all hyperparameter configurations to establish rigorous baselines
for future work on compression and sparsification. |
---|---|
DOI: | 10.48550/arxiv.1902.09574 |