An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification

This paper considers the recognition of a widely observed type of bilingual code-switched speech: the speaker speaks primarily the host language (usually his native language), but with a few words or phrases in the guest language (usually his second language) inserted in many utterances of the host...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2015-07, Vol.23 (7), p.1144-1159
Hauptverfasser:	Yeh, Ching-feng, Lee, Lin-shan
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustics Bilingual code-switching cross-language acoustic modeling Data models Hidden Markov models language identification Merging Speech Speech coding Speech recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper considers the recognition of a widely observed type of bilingual code-switched speech: the speaker speaks primarily the host language (usually his native language), but with a few words or phrases in the guest language (usually his second language) inserted in many utterances of the host language. In this case, not only the languages are switched back and forth within an utterance so the language identification is difficult, but much less data are available for the guest language, which results in poor recognition accuracy for the guest language part. Unit merging approaches on three levels of acoustic modeling (triphone models, HMM states and Gaussians) have been proposed for cross-lingual data sharing for such highly imbalanced bilingual code-switched speech. In this paper, we present an improved overall framework on top of the previously proposed unit merging approaches for recognizing such code-switched speech. This includes unit recovery for reconstructing the identity for units of the two languages after being merged, unit occupancy ranking to offer much more flexible data sharing between units both across languages and within the language based on the accumulated occupancy of the HMM states, and estimation of frame-level language posteriors using blurred posteriorgram features (BPFs) to be used in decoding. We also present a complete set of experimental results comparing all approaches involved for a real-world application scenario under unified conditions, and show very good improvement achieved with the proposed approaches.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2015.2425214