Statistical word segmentation succeeds given the minimal amount of exposure

One of the first tasks in language acquisition is word segmentation, a process to extract word forms from continuous speech streams. Statistical approaches to word segmentation have been shown to be a powerful mechanism, in which word boundaries are inferred from sequence statistics. This approach r...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Psychonomic bulletin & review 2024-06, Vol.31 (3), p.1172-1180
Hauptverfasser: Hao Wang, Felix, Luo, Meili, Wang, Suiping
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:One of the first tasks in language acquisition is word segmentation, a process to extract word forms from continuous speech streams. Statistical approaches to word segmentation have been shown to be a powerful mechanism, in which word boundaries are inferred from sequence statistics. This approach requires the learner to represent the frequency of units from syllable sequences, though accounts differ on how much statistical exposure is required. In this study, we examined the computational limit with which words can be extracted from continuous sequences. First, we discussed why two occurrences of a word in a continuous sequence is the computational lower limit for this word to be statistically defined. Next, we created short syllable sequences that contained certain words either two or four times. Learners were presented with these syllable sequences one at a time, immediately followed by a test of the novel words from these sequences. We found that, with the computationally minimal amount of two exposures, words were successfully segmented from continuous sequences. Moreover, longer syllable sequences providing four exposures to words generated more robust learning results. The implications of these results are discussed in terms of how learners segment and store the word candidates from continuous sequences.
ISSN:1069-9384
1531-5320
1531-5320
DOI:10.3758/s13423-023-02386-z