Submodular Based Unsupervised Data Selection

Automatic speech recognition (ASR) and keyword search (KWS) have more and more found their way into our everyday lives, and their successes could boil down lots of factors. In these factors, large scale of speech data used for acoustic modeling is the key factor. However, it is difficult and time-co...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEICE Transactions on Information and Systems 2018/06/01, Vol.E101.D(6), pp.1591-1604
Hauptverfasser:	ZHANG, Aiying, NI, Chongjia
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustics Artificial neural networks Automatic speech recognition Data transfer (computers) keyword spotting Language Language identification Languages Modelling multilingual data selection Multilingualism Neural networks Performance enhancement recurrent neural network long short term memory Speech recognition submodular Transcription Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Automatic speech recognition (ASR) and keyword search (KWS) have more and more found their way into our everyday lives, and their successes could boil down lots of factors. In these factors, large scale of speech data used for acoustic modeling is the key factor. However, it is difficult and time-consuming to acquire large scale of transcribed speech data for some languages, especially for low-resource languages. Thus, at low-resource condition, it becomes important with which transcribed data for acoustic modeling for improving the performance of ASR and KWS. In view of using acoustic data for acoustic modeling, there are two different ways. One is using the target language data, and another is using large scale of other source languages data for cross-lingual transfer. In this paper, we propose some approaches for efficient selecting acoustic data for acoustic modeling. For target language data, a submodular based unsupervised data selection approach is proposed. The submodular based unsupervised data selection could select more informative and representative utterances for manual transcription for acoustic modeling. For other source languages data, the high misclassified as target language based submodular multilingual data selection approach and knowledge based group multilingual data selection approach are proposed. When using selected multilingual data for multilingual deep neural network training for cross-lingual transfer, it could improve the performance of ASR and KWS of target language. When comparing our proposed multilingual data selection approach with language identification based multilingual data selection approach, our proposed approach also obtains better effect. In this paper, we also analyze and compare the language factor and the acoustic factor influence on the performance of ASR and KWS. The influence of different scale of target language data on the performance of ASR and KWS at mono-lingual condition and cross-lingual condition are also compared and analyzed, and some significant conclusions can be concluded.
ISSN:	0916-8532 1745-1361
DOI:	10.1587/transinf.2017EDP7367