Finding Complex Features for Guest Language Fragment Recovery in Resource-Limited Code-Mixed Speech Recognition
The rise of mobile devices and online learning brings into sharp focus the importance of speech recognition not only for the many languages of the world but also for code-mixed speech, especially where English is the second language. The recognition of code-mixed speech, where the speaker mixes lang...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2015-12, Vol.23 (12), p.2148-2161 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The rise of mobile devices and online learning brings into sharp focus the importance of speech recognition not only for the many languages of the world but also for code-mixed speech, especially where English is the second language. The recognition of code-mixed speech, where the speaker mixes languages within a single utterance, is a challenge for both computers and humans, not least because of the limited training data. We conduct research on a Mandarin-English code-mixed lecture corpus, where Mandarin is the host language and English the guest language, and attempt to find complex features for the recovery of English segments that were misrecognized in the initial recognition pass. We propose a multi-level framework wherein both low-level and high-level cues are jointly considered; we use phonotactic, prosodic, and linguistic cues in addition to acoustic-phonetic cues to discriminate at the frame level between English- and Chinese-language segments. We develop a simple and exact method for CRF feature induction, and improved methods for using cascaded features derived from the training corpus. By additionally tuning the data imbalance ratio between English and Chinese, we demonstrate highly significant improvements over previous work in the recovery of English-language segments, and demonstrate performance superior to DNN-based methods. We demonstrate considerable performance improvements not only with the traditional GMM-HMM recognition paradigm but also with a state-of-the-art hybrid CD-HMM-DNN recognition framework. |
---|---|
ISSN: | 2329-9290 2329-9304 |
DOI: | 10.1109/TASLP.2015.2469634 |