String-based minimum verification error (SB-MVE) training for speech recognition

In recent years, we have experienced an increasing demand for speech recognition technology to be utilized in various real-world applications, such as name dialling, message retrieval, etc. During this process, we have learned that the performance of speech recognition systems under laboratory envir...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer speech & language 1997-04, Vol.11 (2), p.147-160
Hauptverfasser: Rahim, Mazin G., Lee, Chin-Hui
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In recent years, we have experienced an increasing demand for speech recognition technology to be utilized in various real-world applications, such as name dialling, message retrieval, etc. During this process, we have learned that the performance of speech recognition systems under laboratory environment cannot be duplicated in the actual service. Two major causes have been identified to this problem. The first is the lack of robustnesswhen the acoustic conditions in testing are different from those in training. The second is the lack of flexibilitywhen handling spontaneous speech input which often contains extraneous speech in addition to the desired speech segments of key phrases. This paper focuses on one aspect of achieving flexible speech recognition, namely, improving the ability to cope with naturally spoken utterances through discriminative utterance verification. We propose an algorithm for training utterance verification systems based on the minimum verification error (MVE) training framework. Experimental results on speaker-independent telephone-based connected digits show a significant improvement in verification accuracy when the discriminant function used in MVE training is made consistent with the confidence measure used in utterance verification. At a 10% rejection rate, for example, the new proposed method reduces the string error rate by a further 22·7% over our previously reported results in which the MVE-based discriminative training was not incorporated.
ISSN:0885-2308
1095-8363
DOI:10.1006/csla.1997.0026