Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to sc...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Fine-tuning language models~(LMs) on human-generated data remains a prevalent
practice. However, the performance of such models is often limited by the
quantity and diversity of high-quality human data. In this paper, we explore
whether we can go beyond human data on tasks where we have access to scalar
feedback, for example, on math problems where one can verify correctness. To do
so, we investigate a simple self-training method based on
expectation-maximization, which we call ReST$^{EM}$, where we (1) generate
samples from the model and filter them using binary feedback, (2) fine-tune the
model on these samples, and (3) repeat this process a few times. Testing on
advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find
that ReST$^{EM}$ scales favorably with model size and significantly surpasses
fine-tuning only on human data. Overall, our findings suggest self-training
with feedback can substantially reduce dependence on human-generated data. |
---|---|
DOI: | 10.48550/arxiv.2312.06585 |