Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
We consider the problem of online fine tuning the parameters of a language model at test time, also known as dynamic evaluation. While it is generally known that this approach improves the overall predictive performance, especially when considering distributional shift between training and evaluatio...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We consider the problem of online fine tuning the parameters of a language
model at test time, also known as dynamic evaluation. While it is generally
known that this approach improves the overall predictive performance,
especially when considering distributional shift between training and
evaluation data, we here emphasize the perspective that online adaptation turns
parameters into temporally changing states and provides a form of
context-length extension with memory in weights, more in line with the concept
of memory in neuroscience. We pay particular attention to the speed of
adaptation (in terms of sample efficiency),sensitivity to the overall
distributional drift, and the computational overhead for performing gradient
computations and parameter updates. Our empirical study provides insights on
when online adaptation is particularly interesting. We highlight that with
online adaptation the conceptual distinction between in-context learning and
fine tuning blurs: both are methods to condition the model on previously
observed tokens. |
---|---|
DOI: | 10.48550/arxiv.2403.01518 |