Noisy Constrained Maximum-Likelihood Linear Regression for Noise-Robust Speech Recognition

Adaptive training is a widely used technique for building speech recognition systems on nonhomogeneous training data. Recently, there has been interest in applying these approaches for situations where there is significant levels of background noise in the training data. Various schemes for adaptive...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2011-02, Vol.19 (2), p.315-325
Hauptverfasser: Kim, D K, Gales, M J F
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Adaptive training is a widely used technique for building speech recognition systems on nonhomogeneous training data. Recently, there has been interest in applying these approaches for situations where there is significant levels of background noise in the training data. Various schemes for adaptive training are based on noise-, or speaker-, specific transforms of features to yield estimates of the clean speech. However, when there are high levels of background noise, these clean speech estimates may be poor resulting in degradations in performance. In this paper, a new approach for adaptive training on noise-corrupted training data is presented. It extends a popular form of linear transform for model-based adaptation and adaptive training, constrained MLLR (CMLLR), to reflect additional uncertainty from noise-corrupted observations. This new form of adaptation transform is called noisy CMLLR (NCMLLR). NCMLLR uses a modified version of generative model between clean speech and noisy observation, similar to factor analysis (FA). However, in contrast to FA here the generative model describes an adaptation transform, rather than a covariance matrix structure. The use of NCMLLR for adaptive training using an expectation-maximization approach is described. Discriminative adaptive training with NCMLLR is also described based on the minimum phone error criterion. Experimental results comparing NCMLLR with standard adaptive training schemes are given on a noise-corrupted version of Resource Management, the ARPA 1994 CSRNAB Spoke 10 task, and in-car recorded data.
ISSN:1558-7916
2329-9290
1558-7924
2329-9304
DOI:10.1109/TASL.2010.2047756