A study on robust utterance verification for connected digits recognition
Utterance verification represents a key technology in the design of a user-friendly speech recognition system. One essential element when designing such a system is the ability to maintain a uniform performance over a wide range of acoustic conditions. An acoustic mismatch between training and testi...
Gespeichert in:
Veröffentlicht in: | The Journal of the Acoustical Society of America 1997-05, Vol.101 (5), p.2892-2902 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Utterance verification represents a key technology in the design of a user-friendly speech recognition system. One essential element when designing such a system is the ability to maintain a uniform performance over a wide range of acoustic conditions. An acoustic mismatch between training and testing conditions often results in an undesirable performance degradation. This paper addresses the issue of robustness in utterance verification of a speech recognition system. Two techniques, namely signal bias removal (SBR) and on-line adaptation, are studied. The SBR algorithm is used to deal with global mismatch conditions caused by handset and channel differences. The on-line adaptation algorithm is used to adjust verification threshold at runtime for achieving a desirable trade-off between false rejection and false alarm in new test conditions. Various on-line adaptation schemes are investigated. We show that both supervised or unsupervised adaptation can effectively adjust the verification threshold to achieve a desirable performance trade-off irrespective of the initial setting of the threshold. We report on connected digit recognition/verification results for matched and mismatched training and testing conditions. At a 5% digit string rejection rate, the proposed robust utterance verification system gives a reduction in string error rate between 32% and 35% over the conventional system, while still correctly rejects over 99.9% of nonvocabulary utterances. |
---|---|
ISSN: | 0001-4966 1520-8524 |
DOI: | 10.1121/1.418519 |