Kernel Eigenspace-Based MLLR Adaptation

In this paper, we propose an application of kernel methods for fast speaker adaptation based on kernelizing the eigenspace-based maximum-likelihood linear regression adaptation method. We call our new method "kernel eigenspace-based maximum-likelihood linear regression adaptation" (KEMLLR)...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2007-03, Vol.15 (3), p.784-795
Hauptverfasser:	Mak, B.K.-W., Hsiao, R.W.-H.
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation Adaptation model Applied sciences Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization Coding, codes composite kernels eigenspace-based maximum-likelihood linear regression (MLLR) adaptation eigenvoice speaker adaptation embedded kernel eigenvoice adaptation Exact sciences and technology Hidden Markov models Information, signal and communications theory Kernel kernel eigenvoice adaptation kernel principal component analysis (PCA) Kernels Linear regression Mathematical analysis Mathematical models Matrices Maximum likelihood decoding Maximum likelihood estimation Maximum likelihood linear regression Principal component analysis Regression Signal and communications theory Signal processing Speech Speech processing Speech recognition Studies Telecommunications and information theory Transformations Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we propose an application of kernel methods for fast speaker adaptation based on kernelizing the eigenspace-based maximum-likelihood linear regression adaptation method. We call our new method "kernel eigenspace-based maximum-likelihood linear regression adaptation" (KEMLLR). In KEMLLR, speaker-dependent (SD) models are estimated from a common speaker-independent (SI) model using MLLR adaptation, and the MLLR transformation matrices are mapped to a kernel-induced high-dimensional feature space, wherein kernel principal component analysis is used to derive a set of eigenmatrices. In addition, a composite kernel is used to preserve row information in the transformation matrices. A new speaker's MLLR transformation matrix is then represented as a linear combination of the leading kernel eigenmatrices, which, though exists only in the feature space, still allows the speaker's mean vectors to be found explicitly. As a result, at the end of KEMLLR adaptation, a regular hidden Markov model (HMM) is obtained for the new speaker and subsequent speech recognition is as fast as normal HMM decoding. KEMLLR adaptation was tested and compared with other adaptation methods on the Resource Management and Wall Street Journal tasks using 5 or 10 s of adaptation speech. In both cases, KEMLLR adaptation gives the greatest improvement over the SI model with 11%-20% word error rate reduction
ISSN:	1558-7916 2329-9290 1558-7924 2329-9304
DOI:	10.1109/TASL.2006.885941