Joint maximum a posteriori adaptation of transformation and HMM parameters

Model adaptation techniques are an efficient way to reduce the mismatch that typically occurs between the training and test condition of any speech recognizer. Adaptation techniques can usually be divided into two families of approaches. On one hand, direct model adaptation attempts to directly rees...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on speech and audio processing 2001-05, Vol.9 (4), p.417-428
Hauptverfasser: Siohan, O., Chesta, C., Chin-Hui Lee
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Model adaptation techniques are an efficient way to reduce the mismatch that typically occurs between the training and test condition of any speech recognizer. Adaptation techniques can usually be divided into two families of approaches. On one hand, direct model adaptation attempts to directly reestimate the model parameters, for example using MAP adaptation. Since direct adaptation only reestimates model parameters of the corresponding units appearing in the adaptation data, a large amount of such data is needed to observe any significant improvement in performance. However, nice asymptotic properties are usually observed, meaning that the performance improves as the amount of adaptation data increases. On the other hand, indirect model adaptation applies a general transformation on some clusters of model parameters. Because each individual model is transformed, the approach is quite effective when a small amount of adaptation data is available. However, as the amount of adaptation data increases, the performance improvement quickly saturates. We propose to jointly estimate model parameters and transformation parameters using a single estimation criterion based on Bayesian statistics. We show that by providing a prior distribution for the model parameters and the transformation parameters, it is possible to jointly estimate these two sets of parameters using maximum a posteriori estimation (MAP). Experimental evaluation on nonnative speaker and channel adaptation illustrates the effectiveness of the proposed approach.
ISSN:1063-6676
2329-9290
1558-2353
2329-9304
DOI:10.1109/89.917687