Approaches to Iterative Speech Feature Enhancement and Recognition

In automatic speech recognition, hidden Markov models (HMMs) are commonly used for speech decoding, while switching linear dynamic models (SLDMs) can be employed for a preceding model-based speech feature enhancement. In this paper, these model types are combined in order to obtain a novel iterative...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2009-07, Vol.17 (5), p.974-984
Hauptverfasser:	Windmann, S., Haeb-Umbach, R.
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Architecture Automatic speech recognition Coding, codes Decoding Dynamical systems Estimates Exact sciences and technology Feedback Hidden Markov models Information, signal and communications theory Iterative decoding Iterative methods Mathematical models Pattern recognition Recognition robust speech recognition Signal and communications theory Signal processing Spatial databases Speech Speech enhancement Speech processing Speech recognition State estimation Studies Telecommunications and information theory Uncertainty
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In automatic speech recognition, hidden Markov models (HMMs) are commonly used for speech decoding, while switching linear dynamic models (SLDMs) can be employed for a preceding model-based speech feature enhancement. In this paper, these model types are combined in order to obtain a novel iterative speech feature enhancement and recognition architecture. It is shown that speech feature enhancement with SLDMs can be improved by feeding back information from the HMM to the enhancement stage. Two different feedback structures are derived. In the first, the posteriors of the HMM states are used to control the model probabilities of the SLDMs, while in the second they are employed to directly influence the estimate of the speech feature distribution. Both approaches lead to improvements in recognition accuracy both on the AURORA2 and AURORA4 databases compared to non-iterative speech feature enhancement with SLDMs. It is also shown that a combination with uncertainty decoding further enhances performance.
ISSN:	1558-7916 2329-9290 1558-7924 2329-9304
DOI:	10.1109/TASL.2009.2014894