Baum-Welch training for segment-based speech recognition

The use of segment-based features and segmentation networks in a segment-based speech recognizer complicates the probabilistic modeling because it alters the sample space of all possible segmentation paths and the feature observation space. This paper describes a novel Baum-Welch training algorithm...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Shu, H., Hetherington, L.L., Glass, J.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The use of segment-based features and segmentation networks in a segment-based speech recognizer complicates the probabilistic modeling because it alters the sample space of all possible segmentation paths and the feature observation space. This paper describes a novel Baum-Welch training algorithm for segment-based speech recognition which addresses these issues by an innovative use of finite-state transducers. This procedure has the desirable property of not requiring initial seed models that were needed by the Viterbi training procedure we have used previously. On the PhoneBook telephone-based corpus of read isolated words, the Baum-Welch training algorithm obtained a relative error reduction of 37 % on the training set and a relative error reduction of 5 % on the test set, compared to Viterbi trained models. When combined with a duration model, and more flexible segmentation network, the Baum-Welch trained models obtain an overall word error rate of 7.6 %, which is the best result we have seen published for the 8000 word task.
DOI:10.1109/ASRU.2003.1318401