Local Mismatch Phone for Confidence Measure in Standard and Accented Chinese Speech Recognition

High error rate in speech recognition is largely due to effects of phone local mismatch caused by unclear speaking or noises. In this paper, we propose an approach of using local mismatch phone to improve the reliability of confidence measure. The features of local mismatch phone can be extracted fr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Cao, Wenxiao, Liu, Yi, Zheng, Thomas Fang
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:High error rate in speech recognition is largely due to effects of phone local mismatch caused by unclear speaking or noises. In this paper, we propose an approach of using local mismatch phone to improve the reliability of confidence measure. The features of local mismatch phone can be extracted from the recognition phone sequence by computing occurrence frequency of each phone and comparing with a preset threshold. Occurrence frequency is defined as occurrence time of recognition phone in its frame best phone sequence divided by interval. Frame best phone is the symbol of HMM state at the end of maximum likelihood token at certain frame. The effectiveness of this feature is evaluated on standard and accented Mandarin speech databases. It gives significant Equal Error Rate reduction of 19.7% and 8.4%, respectively. In addition to fast computation, this feature is independent of acoustic model, and is convenient for combination with other features.
DOI:10.1109/CHINSL.2008.ECP.64