Many-Shot In-Context Learning
Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large language models (LLMs) excel at few-shot in-context learning (ICL) --
learning from a few examples provided in context at inference, without any
weight updates. Newly expanded context windows allow us to investigate ICL with
hundreds or thousands of examples -- the many-shot regime. Going from few-shot
to many-shot, we observe significant performance gains across a wide variety of
generative and discriminative tasks. While promising, many-shot ICL can be
bottlenecked by the available amount of human-generated examples. To mitigate
this limitation, we explore two new settings: Reinforced and Unsupervised ICL.
Reinforced ICL uses model-generated chain-of-thought rationales in place of
human examples. Unsupervised ICL removes rationales from the prompt altogether,
and prompts the model only with domain-specific questions. We find that both
Reinforced and Unsupervised ICL can be quite effective in the many-shot regime,
particularly on complex reasoning tasks. Finally, we demonstrate that, unlike
few-shot learning, many-shot learning is effective at overriding pretraining
biases, can learn high-dimensional functions with numerical inputs, and
performs comparably to fine-tuning. We also find that inference cost increases
linearly in the many-shot regime, and frontier LLMs benefit from many-shot ICL
to varying degrees. Our analysis also reveals the limitations of next-token
prediction loss as an indicator of downstream ICL performance. |
---|---|
DOI: | 10.48550/arxiv.2404.11018 |