A novel iteration scheme with conjugate gradient for faster pruning on transformer models

Pre-trained models based on the Transformer architecture have significantly advanced research within the domain of Natural Language Processing (NLP) due to their superior performance and extensive applicability across multiple technological sectors. Despite these advantages, there is a significant c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Complex & Intelligent Systems 2024-12, Vol.10 (6), p.7863-7875
Hauptverfasser: Li, Jun, Zhu, Yuchen, Sun, Kexue
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Pre-trained models based on the Transformer architecture have significantly advanced research within the domain of Natural Language Processing (NLP) due to their superior performance and extensive applicability across multiple technological sectors. Despite these advantages, there is a significant challenge in optimizing these models for more efficient deployment. To be concrete, the existing post-training pruning frameworks of transformer models suffer from inefficiencies in the crucial stage of pruning accuracy recovery, which impacts the overall pruning efficiency. To address this issue, this paper introduces a novel and efficient iteration scheme with conjugate gradient in the pruning recovery stage. By constructing a series of conjugate iterative directions, this approach ensures each optimization step is orthogonal to the previous ones, which effectively reduces redundant explorations of the search space. Consequently, each iteration progresses effectively towards the global optimum, thereby significantly enhancing search efficiency. The conjugate gradient-based faster-pruner reduces the time expenditure of the pruning process while maintaining accuracy, demonstrating a high degree of solution stability and exceptional model acceleration effects. In pruning experiments conducted on the BERT BASE and DistilBERT models, the faster-pruner exhibited outstanding performance on the GLUE benchmark dataset, achieving a reduction of up to 36.27% in pruning time and a speed increase of up to 1.45× on an RTX 3090 GPU.
ISSN:2199-4536
2198-6053
DOI:10.1007/s40747-024-01595-w