Automatic assessment of English proficiency for Japanese learners without reference sentences based on deep neural network acoustic models

•A novel machine score for automatic pronunciation evaluation is proposed: Reference-free Error Rate (RER).•The non-native acoustic models and native ones are combined together as an ASR-based automatic English proficiency evaluation system.•The DNN-based acoustic models significantly improved the a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Speech communication 2020-01, Vol.116, p.86-97
Hauptverfasser:	Fu, Jiang, Chiba, Yuya, Nose, Takashi, Ito, Akinori
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustic models Acoustics Artificial neural networks Automatic proficiency assessment Computer assisted language learning Computer-assisted language learning (CALL) Conversation Deep neural network (DNN) English as a second language learning English proficiency Error analysis Evaluation Japanese language Japanese learners Markov analysis Markov chains Neural networks Non-native speech Oral reading Probabilistic models Pronunciation Recognition Sentences Speech recognition Spontaneous speech Transcription
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•A novel machine score for automatic pronunciation evaluation is proposed: Reference-free Error Rate (RER).•The non-native acoustic models and native ones are combined together as an ASR-based automatic English proficiency evaluation system.•The DNN-based acoustic models significantly improved the accuracy of recognition.•The established evaluation system has the ability to evaluate the utterance from the speaker without knowing the transcription in advance.•The performance of the proposed RER score has a high correlation with human proficiency score. Speech-based computer-assisted language learning (CALL) systems should recognize the utterances of the learner with high accuracy and evaluate the language proficiency of the specific speaker with appropriate methods. In this paper, we discuss the automatic assessment of the second language (L2) for non-native speakers. There are many existing works on pronunciation evaluation by applying the goodness of pronunciation (GOP) method. This paper introduces an automatic proficiency evaluation system that combines various kinds of non-native acoustic models and native ones, such as Gaussian mixture model (GMM)-hidden Markov model (HMM) and deep neural network (DNN)-HMM. Most of existing works assume that we know the transcription of an utterance (the reference sentence) when evaluating the utterance, especially in reading and repeating tasks. To realize a reference-free proficiency evaluation, we propose a novel machine score named as the reference-free error rate (RER) to evaluate English proficiency. In our experiments, the DNN-based non-native acoustic models outperformed the traditional acoustic models on non-native speech recognition. Thus, we calculated the RER by regarding the recognition result from the DNN-based non-native acoustic model as “reference” and the result from the native acoustic model as “recognition result”. The proposed RER has high correlation with human proficiency scores, which indicates the effectiveness of RER for automatically estimating the proficiency. By combining the RER with other machine scores such as the log-likelihood scores, we obtained high correlation (reading aloud task: r=0.826,p
ISSN:	0167-6393 1872-7182
DOI:	10.1016/j.specom.2019.12.002