Linear Regression Based Acoustic Adaptation for the Subspace Gaussian Mixture Model

This paper presents a study of two acoustic speaker adaptation techniques applied in the context of the subspace Gaussian mixture model (SGMM) for automatic speech recognition (ASR). First, a model space linear regression based approach is presented for adaptation of SGMM state projection vectors an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2014-09, Vol.22 (9), p.1391-1402
Hauptverfasser: Ghalehjegh, Sina Hamidi, Rose, Richard C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper presents a study of two acoustic speaker adaptation techniques applied in the context of the subspace Gaussian mixture model (SGMM) for automatic speech recognition (ASR). First, a model space linear regression based approach is presented for adaptation of SGMM state projection vectors and is referred to as subspace vector adaptation (SVA). Second, an easy to implement realization of constrained maximum likelihood linear regression (CMLLR) is presented for feature space adaptation in the SGMM. Numerically stable procedures for row-by-row estimation of the regression based transformation matrices are presented for both SVA and CMLLR adaptation. These approaches are applied to SGMM models that are estimated using speaker adaptive training (SAT), a technique for estimating more compact speaker independent acoustic models. Unsupervised speaker adaptation performance is evaluated on conversational and read speech task domains and compared to unsupervised adaptation performance obtained using the hidden Markov model-Gaussian mixture model (HMM-GMM) in ASR. It is shown that the feature space and model space adaptation approaches applied to the SGMM provide complementary reductions in word error rate (WER) and provide lower WERs than that obtained using CMLLR adaptation for the HMM-GMM.
ISSN:2329-9290
2329-9304
DOI:10.1109/TASLP.2014.2332043