End-to-end child-adult speech diarization in naturalistic conditions of preschool classrooms

Speech and language development are early indicators of overall analytical and learning ability in pre-school children. Early childhood researchers are interested in analyzing naturalistic versus controlled lab recordings to assess both quality and quantity of such communication interactions between...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of the Acoustical Society of America 2023-03, Vol.153 (3_supplement), p.A174-A174
Hauptverfasser: Kothalkar, Prasanna V., Irvin, Dwight, Buzhardt, Jay, Hansen, John H.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Speech and language development are early indicators of overall analytical and learning ability in pre-school children. Early childhood researchers are interested in analyzing naturalistic versus controlled lab recordings to assess both quality and quantity of such communication interactions between children and adults/teachers. Unfortunately, present-day speech technologies are not capable of addressing the wide dynamic scenario of early childhood classroom settings. Due to diversity of acoustic events/conditionsin daylong audio streams, automated speaker diarization technology is limited and must be advanced to address this challenging domain for audio segmentation and meta-data information extraction. We investigate a Deep Learning-based diarization solution for segmenting classroom interactions of 3–5 year-old children engaging with teachers. Here, the focus is on speaker-label diarization which classifies speech segments as belonging to either Adults or Children, partitioned across multiple classrooms. Our proposed ECAPA-TDNN model achieves a best F1-score of 65.5% on data from two classrooms, based on open dev and test sets for each classroom. Also, F1-scores for individual speaker labels provide a breakdown of performance across naturalistic child classroom engagement. The study demonstrates the prospects of addressing educational assessment needs through communication audio stream analysis, while maintaining both security and privacy of all children and adults.
ISSN:0001-4966
1520-8524
DOI:10.1121/10.0018568