Advances in Pruning and Quantization for Natural Language Processing

With ongoing advancements in natural language processing (NLP) and deep learning methods, the demand for computational and memory resources has considerably increased, which signifies the determination of efficient and compact models in resource-constrained environments. A comprehensive overview of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.139113-139128
Hauptverfasser: Bibi, Ummara, Mazhar, Mahrukh, Sabir, Dilshad, Fasih Uddin Butt, Muhammad, Hassan, Ali, Ali Ghazanfar, Mustansar, Ali Khan, Arshad, Abdul, Wadood
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With ongoing advancements in natural language processing (NLP) and deep learning methods, the demand for computational and memory resources has considerably increased, which signifies the determination of efficient and compact models in resource-constrained environments. A comprehensive overview of the most recent advancements in pruning and quantization methods for deep neural networks is provided in this paper. Numerous cutting-edge techniques that harness the complementary advantages of pruning and quantization have been analyzed, highlighting their effectiveness in reducing model size, enhancing computational efficiency, and minimizing memory usage. These techniques include Quantization and Sparsity Aware Fine Tuning, Compression Learning by In-Parallel Pruning-Quantization, GroupReduce, Quantization-Pruned Attention, Structured Pruning, Normalized Linear Quantization (Prune and NLQ), Quantization and Pruning for Sentiment Analysis, an Automatic mixed-precision Quantization approach for BERT compression (AQ-BERT), Mixed Precision Quantization, Unstructured Pruning, and Quantization, and Magnitude Pruning. The datasets utilized, models employed, and outcomes achieved are taken into account within this research. The utilization of pruning and quantization techniques across diverse deep-learning tasks, NLP, and sentiment analysis are also discussed. Moreover, issues such as compatibility with hardware configurations, optimization complexities, accuracy degradation, and other constraints have been analysed. Several challenges and limitations of weight or unit pruning that are utilized for optimizing memory and quantization techniques to enhance precision are explored. The in-depth analysis of these state-of-the-art techniques and experiments provides a broad understanding. Furthermore, strategies to effectively reduce the computational and memory demands of neural networks without compromising their performance are also analysed.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3465631