Semi-supervised Learning For Robust Speech Evaluation
Speech evaluation measures a learners oral proficiency using automatic models. Corpora for training such models often pose sparsity challenges given that there often is limited scored data from teachers, in addition to the score distribution across proficiency levels being often imbalanced among stu...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Speech evaluation measures a learners oral proficiency using automatic
models. Corpora for training such models often pose sparsity challenges given
that there often is limited scored data from teachers, in addition to the score
distribution across proficiency levels being often imbalanced among student
cohorts. Automatic scoring is thus not robust when faced with under-represented
samples or out-of-distribution samples, which inevitably exist in real-world
deployment scenarios. This paper proposes to address such challenges by
exploiting semi-supervised pre-training and objective regularization to
approximate subjective evaluation criteria. In particular, normalized mutual
information is used to quantify the speech characteristics from the learner and
the reference. An anchor model is trained using pseudo labels to predict the
correctness of pronunciation. An interpolated loss function is proposed to
minimize not only the prediction error with respect to ground-truth scores but
also the divergence between two probability distributions estimated by the
speech evaluation model and the anchor model. Compared to other
state-of-the-art methods on a public data-set, this approach not only achieves
high performance while evaluating the entire test-set as a whole, but also
brings the most evenly distributed prediction error across distinct proficiency
levels. Furthermore, empirical results show the model accuracy on
out-of-distribution data also compares favorably with competitive baselines. |
---|---|
DOI: | 10.48550/arxiv.2409.14666 |