Experiments on Cross-Language Attribute Detection and Phone Recognition With Minimal Target-Specific Training Data

A state-of-the-art automatic speech recognition (ASR) system can often achieve high accuracy for most spoken languages of interest if a large amount of speech material can be collected and used to train a set of language-specific acoustic phone models. However, designing good ASR systems with little...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-03, Vol.20 (3), p.875-887
Hauptverfasser:	Siniscalchi, S. M., Dau-Cheng Lyu, Svendsen, T., Chin-Hui Lee
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustics Applied sciences Exact sciences and technology Information, signal and communications theory Knowledge-based system Materials phonetic features Signal processing Speech Speech processing Speech recognition Target recognition Telecommunications and information theory Training data universal acoustic modeling
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A state-of-the-art automatic speech recognition (ASR) system can often achieve high accuracy for most spoken languages of interest if a large amount of speech material can be collected and used to train a set of language-specific acoustic phone models. However, designing good ASR systems with little or no language-specific speech data for resource-limited languages is still a challenging research topic. As a consequence, there has been an increasing interest in exploring knowledge sharing among a large number of languages so that a universal set of acoustic phone units can be defined to work for multiple or even for all languages. This work aims at demonstrating that a recently proposed automatic speech attribute transcription framework can play a key role in designing language-universal acoustic models by sharing speech units among all target languages at the acoustic phonetic attribute level. The language-universal acoustic models are evaluated through phone recognition. It will be shown that good cross-language attribute detection and continuous phone recognition performance can be accomplished for "unseen" languages using minimal training data from the target languages to be recognized. Furthermore, a phone-based background model (PBM) approach will be presented to improve attribute detection accuracies.
ISSN:	1558-7916 1558-7924
DOI:	10.1109/TASL.2011.2167610