Effective Layer Pruning Through Similarity Metric Perspective
Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks. Such models, however, are restricted by a high computational overhead, limiting their applicability and hindering advancements in the field. Extensive research demonstrated that pruning structure...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep neural networks have been the predominant paradigm in machine learning
for solving cognitive tasks. Such models, however, are restricted by a high
computational overhead, limiting their applicability and hindering advancements
in the field. Extensive research demonstrated that pruning structures from
these models is a straightforward approach to reducing network complexity. In
this direction, most efforts focus on removing weights or filters. Studies have
also been devoted to layer pruning as it promotes superior computational gains.
However, layer pruning often hurts the network predictive ability (i.e.,
accuracy) at high compression rates. This work introduces an effective
layer-pruning strategy that meets all underlying properties pursued by pruning
methods. Our method estimates the relative importance of a layer using the
Centered Kernel Alignment (CKA) metric, employed to measure the similarity
between the representations of the unpruned model and a candidate layer for
pruning. We confirm the effectiveness of our method on standard architectures
and benchmarks, in which it outperforms existing layer-pruning strategies and
other state-of-the-art pruning techniques. Particularly, we remove more than
75% of computation while improving predictive ability. At higher compression
regimes, our method exhibits negligible accuracy drop, while other methods
notably deteriorate model accuracy. Apart from these benefits, our pruned
models exhibit robustness to adversarial and out-of-distribution samples. |
---|---|
DOI: | 10.48550/arxiv.2405.17081 |