Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models
As language models have scaled both their number of parameters and pretraining dataset sizes, the computational cost for pretraining has become intractable except for the most well-resourced teams. This increasing cost makes it ever more important to be able to reuse a model after it has completed p...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | As language models have scaled both their number of parameters and
pretraining dataset sizes, the computational cost for pretraining has become
intractable except for the most well-resourced teams. This increasing cost
makes it ever more important to be able to reuse a model after it has completed
pretraining; allowing for a model's abilities to further improve without
needing to train from scratch. In this work, we detail a set of guidelines that
cover how to design efficacious data distributions and learning rate schedules
for continued pretraining of language models. When applying these findings
within a continued pretraining run on top of a well-trained 15B parameter
model, we show an improvement of 9\% in average model accuracy compared to the
baseline of continued training on the pretraining set. The resulting recipe
provides a practical starting point with which to begin developing language
models through reuse rather than retraining. |
---|---|
DOI: | 10.48550/arxiv.2407.07263 |