Large-scale benchmark yields no evidence that language model surprisal explains syntactic disambiguation difficulty
Prediction has been proposed as an overarching principle that explains human information processing in language and beyond. To what degree can processing difficulty in syntactically complex sentences – one of the major concerns of psycholinguistics – be explained by predictability, as estimated usin...
Gespeichert in:
Veröffentlicht in: | Journal of memory and language 2024-08, Vol.137, p.104510, Article 104510 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Prediction has been proposed as an overarching principle that explains human information processing in language and beyond. To what degree can processing difficulty in syntactically complex sentences – one of the major concerns of psycholinguistics – be explained by predictability, as estimated using computational language models, and operationalized as surprisal (negative log probability)? A precise, quantitative test of this question requires a much larger scale data collection effort than has been done in the past. We present the Syntactic Ambiguity Processing Benchmark, a dataset of self-paced reading times from 2000 participants, who read a diverse set of complex English sentences. This dataset makes it possible to measure processing difficulty associated with individual syntactic constructions, and even individual sentences, precisely enough to rigorously test the predictions of computational models of language comprehension. By estimating the function that relates surprisal to reading times from filler items included in the experiment, we find that the predictions of language models with two different architectures sharply diverge from the empirical reading time data, dramatically underpredicting processing difficulty, failing to predict relative difficulty among different syntactic ambiguous constructions, and only partially explaining item-wise variability. These findings suggest that next-word prediction is most likely insufficient on its own to explain human syntactic processing.
•We collect a large reading time dataset for English syntactically complex sentences.•We evaluate to what extent surprisal – estimated using neural network language models – can explain processing difficulty.•Surprisal greatly underestimated processing difficulty for most constructions.•It failed to predict relative difficulty among different garden path constructions.•It also failed to predict across-item variation. |
---|---|
ISSN: | 0749-596X 1096-0821 |
DOI: | 10.1016/j.jml.2024.104510 |