Designing text corpus using phone-error distribution for acoustic modeling

It is expensive to prepare a sufficient amount of training data for acoustic modeling for developing large vocabulary continuous speech recognition systems. This is a serious problem especially for resource-deficient languages. We propose an active learning method that effectively reduces the amount...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Murakami, H., Shinoda, K., Furui, S.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:It is expensive to prepare a sufficient amount of training data for acoustic modeling for developing large vocabulary continuous speech recognition systems. This is a serious problem especially for resource-deficient languages. We propose an active learning method that effectively reduces the amount of training data without any degradation in recognition performance. It is used to design a text corpus for read speech collection. It first estimates phone-error distribution using a small amount of fully transcribed speech data. Second, it constructs a sentence set whose phone-occurrence distribution is close to the phone-error distribution and collects its speech data. It then extends this process to diphones and triphones and collects more speech data. We evaluated our method with simulation experiments using the Corpus of Spontaneous Japanese. It required only 76 h of speech data to achieve word accuracy of 74.7%, while the conventional training method required 152 h of data to achieve the same rate.
DOI:10.1109/ASRU.2011.6163929