Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models
In this paper we investigate the GMM-derived (GMMD) features for adaptation of deep neural network (DNN) acoustic models. The adaptation of the DNN trained on GMMD features is done through the maximum a posteriori (MAP) adaptation of the auxiliary GMM model used for GMMD feature extraction. We explo...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper we investigate the GMM-derived (GMMD) features for adaptation
of deep neural network (DNN) acoustic models. The adaptation of the DNN trained
on GMMD features is done through the maximum a posteriori (MAP) adaptation of
the auxiliary GMM model used for GMMD feature extraction. We explore fusion of
the adapted GMMD features with conventional features, such as bottleneck and
MFCC features, in two different neural network architectures: DNN and
time-delay neural network (TDNN). We analyze and compare different types of
adaptation techniques such as i-vectors and feature-space adaptation techniques
based on maximum likelihood linear regression (fMLLR) with the proposed
adaptation approach, and explore their complementarity using various types of
fusion such as feature level, posterior level, lattice level and others in
order to discover the best possible way of combination. Experimental results on
the TED-LIUM corpus show that the proposed adaptation technique can be
effectively integrated into DNN and TDNN setups at different levels and provide
additional gain in recognition performance: up to 6% of relative word error
rate reduction (WERR) over the strong feature-space adaptation techniques based
on maximum likelihood linear regression (fMLLR) speaker adapted DNN baseline,
and up to 18% of relative WERR in comparison with a speaker independent (SI)
DNN baseline model, trained on conventional features. For TDNN models the
proposed approach achieves up to 26% of relative WERR in comparison with a SI
baseline, and up 13% in comparison with the model adapted by using i-vectors.
The analysis of the adapted GMMD features from various points of view
demonstrates their effectiveness at different levels. |
---|---|
DOI: | 10.48550/arxiv.2003.06894 |