Shared functional specialization in transformer-based language models and the human brain

When processing language, the brain is thought to deploy specialized computations to construct meaning from complex linguistic structures. Recently, artificial neural networks based on the Transformer architecture have revolutionized the field of natural language processing. Transformers integrate c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Nature communications 2024-06, Vol.15 (1), p.5523-19, Article 5523
Hauptverfasser:	Kumar, Sreejan, Sumers, Theodore R., Yamakoshi, Takateru, Goldstein, Ariel, Hasson, Uri, Norman, Kenneth A., Griffiths, Thomas L., Hawkins, Robert D., Nastase, Samuel A.
Format:	Artikel
Sprache:	eng
Schlagworte:	59/36 631/378/116/2395 631/378/2649/1594 Adult Artificial neural networks Attention Brain Brain - diagnostic imaging Brain - physiology Brain architecture Brain Mapping Data acquisition Female Functional magnetic resonance imaging Humanities and Social Sciences Humans Information processing Language Magnetic Resonance Imaging Male Models, Neurological multidisciplinary Natural language Natural Language Processing Neural networks Neural Networks, Computer Science Science (multidisciplinary) Words (language) Young Adult
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	When processing language, the brain is thought to deploy specialized computations to construct meaning from complex linguistic structures. Recently, artificial neural networks based on the Transformer architecture have revolutionized the field of natural language processing. Transformers integrate contextual information across words via structured circuit computations. Prior work has focused on the internal representations (“embeddings”) generated by these circuits. In this paper, we instead analyze the circuit computations directly: we deconstruct these computations into the functionally-specialized “transformations” that integrate contextual information across words. Using functional MRI data acquired while participants listened to naturalistic stories, we first verify that the transformations account for considerable variance in brain activity across the cortical language network. We then demonstrate that the emergent computations performed by individual, functionally-specialized “attention heads” differentially predict brain activity in specific cortical regions. These heads fall along gradients corresponding to different layers and context lengths in a low-dimensional cortical space. The extent to which transformer-based language models provide a good model of human brain activity during natural language comprehension is unclear. Here, the authors show that the internal transformations performed by the network predict brain activity in cortical language network as measured by fMRI.
ISSN:	2041-1723 2041-1723
DOI:	10.1038/s41467-024-49173-5