An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification
This paper considers the recognition of a widely observed type of bilingual code-switched speech: the speaker speaks primarily the host language (usually his native language), but with a few words or phrases in the guest language (usually his second language) inserted in many utterances of the host...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2015-07, Vol.23 (7), p.1144-1159 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper considers the recognition of a widely observed type of bilingual code-switched speech: the speaker speaks primarily the host language (usually his native language), but with a few words or phrases in the guest language (usually his second language) inserted in many utterances of the host language. In this case, not only the languages are switched back and forth within an utterance so the language identification is difficult, but much less data are available for the guest language, which results in poor recognition accuracy for the guest language part. Unit merging approaches on three levels of acoustic modeling (triphone models, HMM states and Gaussians) have been proposed for cross-lingual data sharing for such highly imbalanced bilingual code-switched speech. In this paper, we present an improved overall framework on top of the previously proposed unit merging approaches for recognizing such code-switched speech. This includes unit recovery for reconstructing the identity for units of the two languages after being merged, unit occupancy ranking to offer much more flexible data sharing between units both across languages and within the language based on the accumulated occupancy of the HMM states, and estimation of frame-level language posteriors using blurred posteriorgram features (BPFs) to be used in decoding. We also present a complete set of experimental results comparing all approaches involved for a real-world application scenario under unified conditions, and show very good improvement achieved with the proposed approaches. |
---|---|
ISSN: | 2329-9290 2329-9304 |
DOI: | 10.1109/TASLP.2015.2425214 |