Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language

The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Wireless personal communications 2021-06, Vol.118 (4), p.3303-3333
Hauptverfasser: Bhatt, Shobha, Jain, Anurag, Dev, Amita
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the speech recognition errors. Based on the error analysis, different feature extraction techniques, classification techniques, acoustic models, and pronunciation dictionaries can be selected to improve the speech recognition system's performance. Earlier studies for Hindi speech recognition lack detailed comparative analysis of confusing words for different feature extractions methods. As speaker-independent systems are developed for all, comparative analysis of confusing words is also presented for all feature extraction techniques. Speaker independent system was proposed with five states monophone based hidden Markov model (HMM) using HMM-based tool kit HTK. A Self-created data set of Hindi speech corpus has been used in the experiment. Feature extraction techniques such as linear predictive coding cepstral coefficients (LPCCs), mel frequency cepstral coefficients (MFCCs), and perceptual linear prediction coefficients (PLPs) were applied using delta, double delta, and energy parameters to evaluate the performance of the proposed methodology. The system was assessed by using different feature extraction techniques for speaker-independent mode. Research findings reveal that PLP coefficients show the highest recognition score, while LPCCs got the lowest recognition scores.Investigations also reveal that both PLP and MFCC coefficients are better than LPCC in speech recognition. Comparative analysis of confusing words shows that PLPs and MFCCs show fewer confusions than LPCCs and exhibit mostly the same pattern in the confusion analysis. Research outcomes also reveal that substitution errors are a significant cause of low recognition. It was also found that some words were recognized with individual feature extraction techniques only. Confusion analysis of the words indicates that words which have nasals, liquid, and fricative sound in first place exhibit more confusions. The investigation could improve speech recognition by choosing an appropriate feature extraction method and mixing the various feature extraction methods. The research outcomes can also be utilized to build linguistic resources for improving speech recognition. The results show that the developed re
ISSN:0929-6212
1572-834X
DOI:10.1007/s11277-021-08181-0