Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition
In this paper, pronunciation variability between native and non-native speakers is investigated, and a novel acoustic model adaptation method is proposed based on pronunciation variability analysis in order to improve the performance of a speech recognition system by non-native speakers. The propose...
Gespeichert in:
Veröffentlicht in: | Speech communication 2007, Vol.49 (1), p.59-70 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, pronunciation variability between native and non-native speakers is investigated, and a novel acoustic model adaptation method is proposed based on pronunciation variability analysis in order to improve the performance of a speech recognition system by non-native speakers. The proposed acoustic model adaptation method is performed in two steps: analysis of the pronunciation variability of non-native speech, and acoustic model adaptation based on the pronunciation variability analysis. In order to obtain informative variant phonetic units, we analyze the pronunciation variability of non-native speech in two ways: a knowledge-based approach, and a data-driven approach. Next, for each approach, the acoustic model corresponding to each informative variant phonetic unit is adapted such that the state-tying of the acoustic model for non-native speech reflects a phonetic variability. For further improvement, a conventional acoustic model adaptation method such as MLLR and/or MAP is combined with the proposed acoustic model adaptation method. It is shown from the continuous Korean–English speech recognition experiments that the proposed method achieves an average word error rate reduction of 16.76% and 12.80% for the knowledge-based approach and the data-driven approach, respectively, when compared with the baseline speech recognition system trained by native speech. Moreover, a reduction of 53.45% and 57.14% in the average word error rate is obtained by combining MLLR and MAP adaptations to the adapted acoustic models by the proposed method for the knowledge-based approach and the data-driven approach, respectively. |
---|---|
ISSN: | 0167-6393 1872-7182 |
DOI: | 10.1016/j.specom.2006.10.006 |