Schema-learning and rebinding as mechanisms of in-context learning and emergence
In-context learning (ICL) is one of the most powerful and most unexpected capabilities to emerge in recent transformer-based large language models (LLMs). Yet the mechanisms that underlie it are poorly understood. In this paper, we demonstrate that comparable ICL capabilities can be acquired by an a...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In-context learning (ICL) is one of the most powerful and most unexpected
capabilities to emerge in recent transformer-based large language models
(LLMs). Yet the mechanisms that underlie it are poorly understood. In this
paper, we demonstrate that comparable ICL capabilities can be acquired by an
alternative sequence prediction learning method using clone-structured causal
graphs (CSCGs). Moreover, a key property of CSCGs is that, unlike
transformer-based LLMs, they are {\em interpretable}, which considerably
simplifies the task of explaining how ICL works. Specifically, we show that it
uses a combination of (a) learning template (schema) circuits for pattern
completion, (b) retrieving relevant templates in a context-sensitive manner,
and (c) rebinding of novel tokens to appropriate slots in the templates. We go
on to marshall evidence for the hypothesis that similar mechanisms underlie ICL
in LLMs. For example, we find that, with CSCGs as with LLMs, different
capabilities emerge at different levels of overparameterization, suggesting
that overparameterization helps in learning more complex template (schema)
circuits. By showing how ICL can be achieved with small models and datasets, we
open up a path to novel architectures, and take a vital step towards a more
general understanding of the mechanics behind this important capability. |
---|---|
DOI: | 10.48550/arxiv.2307.01201 |