Discriminative training for full covariance models
In this paper we revisit discriminative training of full covariance acoustic models for automatic speech recognition. One of the difficult aspects of discriminative training is how to set the constant D that appears in the parameter updates. For diagonal covariance models, this constant D is set bas...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper we revisit discriminative training of full covariance acoustic models for automatic speech recognition. One of the difficult aspects of discriminative training is how to set the constant D that appears in the parameter updates. For diagonal covariance models, this constant D is set based on knowing the smallest value of D, D*, for which the resulting covariances remain positive definite. In this paper we show how to compute D* analytically, and show empirically that knowing this smallest value is important. Our baseline speech recognition models are state of the art broadcast news systems, built using the boosted Maximum Mutual Information criterion and feature space Maximum Mutual Information for feature selection. We show that discriminatively built full covariance models outperform our best diagonal covariance models. Moreover, full covariance models at optimal performance can be obtained by only a few discriminative iterations starting with a diagonal covariance model. The experiments also show that systems utilizing full covariance models are less sensitive to the choice of the number of gaussians. |
---|---|
ISSN: | 1520-6149 2379-190X |
DOI: | 10.1109/ICASSP.2011.5947557 |