Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs

This paper proposes an approach to segmenting and identifying mixed-language speech. A delta Bayesian information criterion (delta-BIC) is firstly applied to segment the input speech utterance into a sequence of language-dependent segments using acoustic features. A VQ-based bi-gram model is used to...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2006-01, Vol.14 (1), p.266-276
Hauptverfasser:	WU, Chung-Hsien, CHIU, Yu-Hsien, SHIA, Chi-Jiun, LIN, Chun-Yu
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustics Application software Applied sciences Bayesian methods Boundaries Dynamic programming Economic models Exact sciences and technology Filtering Gaussian mixture model Information, signal and communications theory language identification latent semantic analysis Mandarins Mathematical models Maximum likelihood detection Maximum likelihood estimation mixed-language speech Natural languages Noise generators Principal component analysis Recall Segmentation Segments Signal processing single-language speech Speech Speech analysis Speech processing Studies Telecommunications and information theory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper proposes an approach to segmenting and identifying mixed-language speech. A delta Bayesian information criterion (delta-BIC) is firstly applied to segment the input speech utterance into a sequence of language-dependent segments using acoustic features. A VQ-based bi-gram model is used to characterize the acoustic-phonetic dynamics of two consecutive codewords in a language. Accordingly the language-specific acoustic-phonetic property of sequence of phones was integrated in the identification process. A Gaussian mixture model (GMM) is used to model codeword occurrence vectors orthonormally transformed using latent semantic analysis (LSA) for each language-dependent segment. A filtering method is used to smooth the hypothesized language sequence and thus eliminate noise-like components of the detected language sequence generated by the maximum likelihood estimation. Finally, a dynamic programming method is used to determine globally the language boundaries. Experimental results show that for Mandarin, English, and Taiwanese, a recall rate of 0.87 for language boundary segmentation was obtained. Based on this recall rate, the proposed approach achieved language identification accuracies of 92.1% and 74.9% for single-language and mixed-language speech, respectively.
ISSN:	1558-7916 2329-9290 1558-7924 2329-9304
DOI:	10.1109/TSA.2005.852992