Compact Language Models via Pruning and Knowledge Distillation

Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-training it with a fraction (

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Muralidharan, Saurav, Sreenivas, Sharath Turuvekere, Joshi, Raviraj, Chochowski, Marcin, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan, Kautz, Jan, Molchanov, Pavlo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-training it with a fraction (
DOI:10.48550/arxiv.2407.14679