Structural design of hidden Markov model speech recognizer using multivalued phonetic features : comparison with segmental speech units

A novel approach to speech recognition, on the basis of a multidimensional multivalued phonetic-feature description of speech signals, is presented and evaluated. The hidden Markov model (HMM) framework is used to provide the recognition algorithm, which assumes that the underlying Markov chain trac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of the Acoustical Society of America 1992-12, Vol.92 (6), p.3058-3067
Hauptverfasser: DENG, L, ERLER, K
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A novel approach to speech recognition, on the basis of a multidimensional multivalued phonetic-feature description of speech signals, is presented and evaluated. The hidden Markov model (HMM) framework is used to provide the recognition algorithm, which assumes that the underlying Markov chain tracks the temporal evolution of the features. It is shown that this approach can naturally accommodate such coarticulatory effects as feature spreading and formant transition in the functionality of the recognizer, and can provide a high degree of acoustic data sharing that makes effective use of training data. Use of phonetic features as the basic speech units creates a framework where the Markov model's state topology in the recognizer can be designed with guidance of detailed speech knowledge. Details of such a design for a stop consonant-vowel vocabulary are described. Experimental results on the task of speaker-dependent stop consonant discrimination, evaluated from speech data from a total of ten male and five female speakers, demonstrate effectiveness of this feature-based recognizer. Over the 15 speakers, the error rates were shown to be reduced by 23%, 37%, 42%, and 38%, respectively, compared with the conventional HMM-based recognition methods using words, phonemes, allophones, and microsegments as the primary speech units.
ISSN:0001-4966
1520-8524
DOI:10.1121/1.404202