Continuous Kannada Speech Recognition System Under Degraded Condition

In this paper, a continuous Kannada speech recognition system is developed under different noisy conditions. The continuous Kannada speech sentences are collected from 2400 speakers across different dialect regions of Karnataka state (a state in the southwestern region of India where Kannada is the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Circuits, systems, and signal processing systems, and signal processing, 2020, Vol.39 (1), p.391-419
Hauptverfasser:	Praveen Kumar, P. S., Thimmaraja Yadava, G., Jayanna, H. S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustic noise Artificial neural networks Automatic speech recognition Circuits and Systems Deep learning Electrical Engineering Electronics and Microelectronics Engineering Error analysis Instrumentation Interactive systems Kannada language Markov analysis Markov chains Modelling Phonemes Probabilistic models Regional dialects Sentences Signal,Image and Speech Processing Speech Speech recognition Transcription Transliteration Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, a continuous Kannada speech recognition system is developed under different noisy conditions. The continuous Kannada speech sentences are collected from 2400 speakers across different dialect regions of Karnataka state (a state in the southwestern region of India where Kannada is the principal language). The word-level transcription and validation of speech data are done by using Indic transliteration tool (IT3:UTF-8). The Kaldi toolkit is used for the development of automatic speech recognition (ASR) models at different phoneme levels. The lexicon and phoneme set are created afresh for continuous Kannada speech sentences. The 80% and 20% of validated speech data are used for system training and testing using Kaldi. The performance of the system is verified by the parameter called word error rate (WER). The acoustic models were built using the techniques such as monophone, triphone1, triphone2, triphone3, subspace Gaussian mixture models (SGMM), combination of deep neural network (DNN) and hidden Markov model (HMM), combination of DNN and SGMM and combination of SGMM and maximum mutual information. The experiment is conducted to determine the WER using different modeling techniques. The results show that the recognition rate obtained through the combination of DNN and HMM outperforms over conventional-based ASR modeling techniques. An interactive voice response system is developed to build an end-to-end ASR system to recognize continuous Kannada speech sentences. The developed ASR system is tested by 300 speakers of Karnataka state under uncontrolled environment.
ISSN:	0278-081X 1531-5878
DOI:	10.1007/s00034-019-01189-9