Pruning-and-distillation: One-stage joint compression framework for CNNs via clustering

Network pruning and knowledge distillation, as two effective network compression techniques, have drawn extensive attention due to their success in reducing model complexity. However, previous works regard them as two independent methods and combine them in an isolated manner rather than joint, lead...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Image and vision computing 2023-08, Vol.136, p.104743, Article 104743
Hauptverfasser: Niu, Tao, Teng, Yinglei, Jin, Lei, Zou, Panpan, Liu, Yiding
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Network pruning and knowledge distillation, as two effective network compression techniques, have drawn extensive attention due to their success in reducing model complexity. However, previous works regard them as two independent methods and combine them in an isolated manner rather than joint, leading to a sub-optimal optimization. In this paper, we propose a collaborative compression scheme named Pruning-and-Distillation viaClustering (PDC), which integrates pruning and distillation into an end-to-end single-stage framework that takes both advantages of them. Specifically, instead of directly deleting or zeroing out unimportant filters within each layer, we reconstruct them based on clustering, which preserves the learned features as much as possible. The guidance from the teacher is integrated into the pruning process to further improve the generalization of pruned model, which alleviates the randomness caused by reconstruction to some extent. After convergence, we can equivalently remove reconstructed filters within each cluster through the proposed channel addition operation. Benefiting from such equivalence, we no longer require the time-consuming fine-tuning step to regain accuracy. Extensive experiments on CIFAR-10/100 and ImageNet datasets show that our method achieves the best trade-off between performance and complexity compared with other state-of-the-art algorithms. For example, for ResNet-110, we achieve a 61.5% FLOPs reduction with even 0.14% top-1 accuracy increase on CIFAR-10 and remove 55.2% FLOPs with only 0.32% accuracy drop on CIFAR-100. •Integrate pruning and distillation into a one-stage framework for collaborative compression.•Reconstruct redundant filters instead of removing to preserve learned knowledge.•Equivalently remove reconstructed filters by the channel addition operation.•Distill with the original model to avoid mismatch problem in vanilla distillation.
ISSN:0262-8856
1872-8138
DOI:10.1016/j.imavis.2023.104743