Multi-environment model adaptation based on vector Taylor series for robust speech recognition

In this paper, we propose a multi-environment model adaptation method based on vector Taylor series (VTS) for robust speech recognition. In the training phase, the clean speech is contaminated with noise at different signal-to-noise ratio (SNR) levels to produce several types of noisy training speec...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition 2010-09, Vol.43 (9), p.3093-3099
Hauptverfasser:	Lü, Yong, Wu, Haiyang, Zhou, Lin, Wu, Zhenyang
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation Applied sciences Exact sciences and technology Information, signal and communications theory Mathematical analysis Mathematical models Model adaptation Multi-environment model Signal processing Speech Speech processing Speech recognition Taylor series Telecommunications and information theory Training Vector Taylor series Vectors (mathematics) VTS
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we propose a multi-environment model adaptation method based on vector Taylor series (VTS) for robust speech recognition. In the training phase, the clean speech is contaminated with noise at different signal-to-noise ratio (SNR) levels to produce several types of noisy training speech and each type is used to obtain a noisy hidden Markov model (HMM) set. In the recognition phase, the HMM set which best matches the testing environment is selected, and further adjusted to reduce the environmental mismatch by the VTS-based model adaptation method. In the proposed method, the VTS approximation based on noisy training speech is given and the testing noise parameters are estimated from the noisy testing speech using the expectation-maximization (EM) algorithm. The experimental results indicate that the proposed multi-environment model adaptation method can significantly improve the performance of speech recognizers and outperforms the traditional model adaptation method and the linear regression-based multi-environment method.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2010.03.023