Measuring Frequency of Child-Directed WH-Question Words for Alternate Preschool Locations Using Speech Recognition and Location Tracking Technologies

Speech and language development in children are crucial for ensuring effective skills in their long-term learning ability. A child's vocabulary size at the time of entry into kindergarten is an early indicator of their learning ability to read and potential long-term success in school. The pres...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Grantee Submission 2021
Hauptverfasser: Kothalkar, Prasanna V, Datla, Sathvik, Dutta, Satwik, Hansen, John H. L, Seven, Yagmur, Irvin, Dwight, Buzhardt, Jay
Format: Report
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Speech and language development in children are crucial for ensuring effective skills in their long-term learning ability. A child's vocabulary size at the time of entry into kindergarten is an early indicator of their learning ability to read and potential long-term success in school. The preschool classroom is thus a promising venue for assessing growth in young children by measuring their interactions with teachers as well as classmates. However, to date limited studies have explored such naturalistic audio communications. Automatic Speech Recognition (ASR) technologies provide an opportunity for 'Early Childhood' researchers to obtain knowledge through automatic analysis of naturalistic classroom recordings in measuring such interactions. For this purpose, 208 hours of audio recordings across 48 daylong sessions are collected in a childcare learning center in the United States using Language Environment Analysis (LENA) devices worn by the preschool children. Approximately 29 hours of adult speech and 26 hours of child speech is segmented using manual transcriptions provided by CRSS transcription team. Traditional as well as End-to-End ASR models are trained on adult/child speech data subset. Factorized Time Delay Neural Network provides a best Word-Error-Rate (WER) of 35.05% on the adult subset of the test set. End-to-End transformer models achieve 63.5% WER on the child subset of the test data. Next, bar plots demonstrating the frequency of WH-question words in Science vs. Reading activity areas of the preschool are presented for sessions in the test set. It is suggested that learning spaces could be configured to encourage greater adult-child conversational engagement given such speech/audio assessment strategies. [This paper was published in: "ICMI '21 Companion, October 18-22, 2021, Montreal, QC, Canada," Association for Computing Machinery, 2021.]
DOI:10.1145/3461615.3485440