Hidden Markov model representation of quantized articulatory features for speech recognition
This paper describes a speech recognizer based on an HMM representation of quantized articulatory features and presents experimental results for its evaluation. Traditional schemes for HMM representation of speech have attempted to model a set of disjoint time segments. In order to create a more rob...
Gespeichert in:
Veröffentlicht in: | Computer speech & language 1993-07, Vol.7 (3), p.265-282 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper describes a speech recognizer based on an HMM representation of quantized articulatory features and presents experimental results for its evaluation. Traditional schemes for HMM representation of speech have attempted to model a set of disjoint time segments. In order to create a more robust speech recognition system, the speech production system is characterized by a set of articulatory features, each of which are allowed to vary over a range of discrete values. Each configuration of the articulatory system is characterized by a particular combination of feature values. "Target configurations" of the articulatory system are those configurations which produce the distinctive homogeneous segments in the acoustic signal. These feature values are permitted to vary independently and asynchronously (with appropriate constraints) as the production system moves from one target configuration to the next (such intermediate feature combinations are referred to as "transitional configurations"). This avoids the abrupt model changes inherent in non-overlapping segment modeling. The feature value combinations that occur while in transit between target configurations represent the coarticulation intervals between the two targets. This scheme is implemented using an ergodic HMM to control the evolution of the feature values as the system moves from one target configuration to the next. Speech recognition results show that the new system outperforms the traditional HMM approaches in small tasks. Examination of the source of error, using Viterbi analysis, in both the new model and in traditional HMM recognition schemes suggests that this new scheme is able to achieve better modelling of the acoustic transitions and coarticulation in speech. |
---|---|
ISSN: | 0885-2308 1095-8363 |
DOI: | 10.1006/csla.1993.1014 |