Towards improving ASR robustness for PSN and GSM telephone applications
In real-life applications, errors in the speech recognition system are mainly due to inefficient detection of speech segments, unreliable rejection of Out-Of-Vocabulary (OOV) words, and insufficient account of noise and transmission channel effects. In this paper, we review a set of techniques devel...
Gespeichert in:
Veröffentlicht in: | Speech communication 1997-10, Vol.23 (1), p.141-159 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In real-life applications, errors in the speech recognition system are mainly due to inefficient detection of speech segments, unreliable rejection of Out-Of-Vocabulary (OOV) words, and insufficient account of noise and transmission channel effects. In this paper, we review a set of techniques developed at CNET in order to increase the robustness to mismatches between training and testing conditions. These techniques are divided in two classes: preprocessing techniques and Hidden Markov Models (HMM) parameters adaptation. The results of several experiments carried out on field databases, as well as on databases collected over PSN and GSM networks are presented. The main sources of errors are analyzed. We show that a blind equalization scheme significantly improves the recognition accuracy regarding both field and GSM data. Speech detection allows a system to delimit the boundaries of the words to be recognized. We also use preprocessing techniques to increase the robustness of such detectors to noisy GSM speech. We show that spectral subtraction improves speech detection under noisy GSM conditions. Bayesian adaptation of HMM parameters produces models which are robust to field and GSM conditions. Models robust to GSM conditions can also be generated by linear regression adaptation of HMM parameters. Our experiments show an equivalent performance obtained with both Bayesian and linear regression adaptation of HMM parameters. The results obtained also prove that HMM adaptation and preprocessing techniques can be advantageously combined to improve Automatic Speech Recognition (ASR) robustness.
Dans les applications, les erreurs d'un système de reconnaissance automatique de parole sont principalement dues à un manque d'efficacité de la détection des segments de parole dans le signal, à un manque de fiabilité du rejet des mots hors vocabulaire ou des bruits, et à une considération insuffisante des effets du bruit et des canaux de transmission. Dans ce papier, nous passons en revue un ensemble de techniques développées au CNET pour augmenter la robustesse aux variations des conditions d'utilisation et d'apprentissage d'un système de reconnaissance. Ces techniques se divisent en deux classes: prétraitement et adaptation des paramètres des modèles de Markov cachés (HMM). Les résultats de plusieurs expériences menées sur des bases de données d'exploitation, ainsi que sur des bases de données collectées à travers les réseaux RTC et GSM, sont présentées. Les sources |
---|---|
ISSN: | 0167-6393 1872-7182 |
DOI: | 10.1016/S0167-6393(97)00042-3 |