Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language

The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Wireless personal communications 2021-06, Vol.118 (4), p.3303-3333
Hauptverfasser: Bhatt, Shobha, Jain, Anurag, Dev, Amita
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3333
container_issue 4
container_start_page 3303
container_title Wireless personal communications
container_volume 118
creator Bhatt, Shobha
Jain, Anurag
Dev, Amita
description The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the speech recognition errors. Based on the error analysis, different feature extraction techniques, classification techniques, acoustic models, and pronunciation dictionaries can be selected to improve the speech recognition system's performance. Earlier studies for Hindi speech recognition lack detailed comparative analysis of confusing words for different feature extractions methods. As speaker-independent systems are developed for all, comparative analysis of confusing words is also presented for all feature extraction techniques. Speaker independent system was proposed with five states monophone based hidden Markov model (HMM) using HMM-based tool kit HTK. A Self-created data set of Hindi speech corpus has been used in the experiment. Feature extraction techniques such as linear predictive coding cepstral coefficients (LPCCs), mel frequency cepstral coefficients (MFCCs), and perceptual linear prediction coefficients (PLPs) were applied using delta, double delta, and energy parameters to evaluate the performance of the proposed methodology. The system was assessed by using different feature extraction techniques for speaker-independent mode. Research findings reveal that PLP coefficients show the highest recognition score, while LPCCs got the lowest recognition scores.Investigations also reveal that both PLP and MFCC coefficients are better than LPCC in speech recognition. Comparative analysis of confusing words shows that PLPs and MFCCs show fewer confusions than LPCCs and exhibit mostly the same pattern in the confusion analysis. Research outcomes also reveal that substitution errors are a significant cause of low recognition. It was also found that some words were recognized with individual feature extraction techniques only. Confusion analysis of the words indicates that words which have nasals, liquid, and fricative sound in first place exhibit more confusions. The investigation could improve speech recognition by choosing an appropriate feature extraction method and mixing the various feature extraction methods. The research outcomes can also be utilized to build linguistic resources for improving speech recognition. The results show that the developed re
doi_str_mv 10.1007/s11277-021-08181-0
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2533363928</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2533363928</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-6971e80b7f50dd96c48c91c7edd6000aff0d259047bc004c1ad7b731131fb58b3</originalsourceid><addsrcrecordid>eNp9kMFKAzEURYMoWKs_4CrgevQl6Uwmy1JaKxQEreguZDLJNLUmNZlB-_dOW8Gdm_c291y4B6FrArcEgN8lQijnGVCSQUnK_p6gAck5zUo2ejtFAxBUZAUl9BxdpLQG6DFBB-h9ZlTbRYOn321UunXB46XRK-8-O5Pwl2tXeOzVZpdcwsHiSfC2S843-DXEOmEbIn7emp7AT0aHxrtDhfO4XRk8d752eKF806nGXKIzqzbJXP3-IXqZTZeTebZ4vH-YjBeZZkS0WSE4MSVU3OZQ16LQo1ILormp6wIAlLVQ01zAiFcaYKSJqnnFGSGM2CovKzZEN8febQz7Fa1chy72I5KkOWOsYIKWfYoeUzqGlKKxchvdh4o7SUDupcqjVNlLlQepEnqIHaHUh31j4l_1P9QPrdp6qA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2533363928</pqid></control><display><type>article</type><title>Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language</title><source>SpringerLink Journals - AutoHoldings</source><creator>Bhatt, Shobha ; Jain, Anurag ; Dev, Amita</creator><creatorcontrib>Bhatt, Shobha ; Jain, Anurag ; Dev, Amita</creatorcontrib><description>The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the speech recognition errors. Based on the error analysis, different feature extraction techniques, classification techniques, acoustic models, and pronunciation dictionaries can be selected to improve the speech recognition system's performance. Earlier studies for Hindi speech recognition lack detailed comparative analysis of confusing words for different feature extractions methods. As speaker-independent systems are developed for all, comparative analysis of confusing words is also presented for all feature extraction techniques. Speaker independent system was proposed with five states monophone based hidden Markov model (HMM) using HMM-based tool kit HTK. A Self-created data set of Hindi speech corpus has been used in the experiment. Feature extraction techniques such as linear predictive coding cepstral coefficients (LPCCs), mel frequency cepstral coefficients (MFCCs), and perceptual linear prediction coefficients (PLPs) were applied using delta, double delta, and energy parameters to evaluate the performance of the proposed methodology. The system was assessed by using different feature extraction techniques for speaker-independent mode. Research findings reveal that PLP coefficients show the highest recognition score, while LPCCs got the lowest recognition scores.Investigations also reveal that both PLP and MFCC coefficients are better than LPCC in speech recognition. Comparative analysis of confusing words shows that PLPs and MFCCs show fewer confusions than LPCCs and exhibit mostly the same pattern in the confusion analysis. Research outcomes also reveal that substitution errors are a significant cause of low recognition. It was also found that some words were recognized with individual feature extraction techniques only. Confusion analysis of the words indicates that words which have nasals, liquid, and fricative sound in first place exhibit more confusions. The investigation could improve speech recognition by choosing an appropriate feature extraction method and mixing the various feature extraction methods. The research outcomes can also be utilized to build linguistic resources for improving speech recognition. The results show that the developed recognition framework achieved the highest recognition word accuracy of 76.68% with PLPs for the speaker-independent model. The proposed system was also compared with existing similar work available.</description><identifier>ISSN: 0929-6212</identifier><identifier>EISSN: 1572-834X</identifier><identifier>DOI: 10.1007/s11277-021-08181-0</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Coefficients ; Communications Engineering ; Comparative analysis ; Computer Communication Networks ; Confusion ; Engineering ; Error analysis ; Feature extraction ; Feature recognition ; Linear prediction ; Markov chains ; Mathematical models ; Networks ; Pattern analysis ; Performance evaluation ; Signal,Image and Speech Processing ; Speech ; Speech recognition ; Voice recognition ; Words (language)</subject><ispartof>Wireless personal communications, 2021-06, Vol.118 (4), p.3303-3333</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-6971e80b7f50dd96c48c91c7edd6000aff0d259047bc004c1ad7b731131fb58b3</citedby><cites>FETCH-LOGICAL-c319t-6971e80b7f50dd96c48c91c7edd6000aff0d259047bc004c1ad7b731131fb58b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11277-021-08181-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11277-021-08181-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,778,782,27907,27908,41471,42540,51302</link.rule.ids></links><search><creatorcontrib>Bhatt, Shobha</creatorcontrib><creatorcontrib>Jain, Anurag</creatorcontrib><creatorcontrib>Dev, Amita</creatorcontrib><title>Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language</title><title>Wireless personal communications</title><addtitle>Wireless Pers Commun</addtitle><description>The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the speech recognition errors. Based on the error analysis, different feature extraction techniques, classification techniques, acoustic models, and pronunciation dictionaries can be selected to improve the speech recognition system's performance. Earlier studies for Hindi speech recognition lack detailed comparative analysis of confusing words for different feature extractions methods. As speaker-independent systems are developed for all, comparative analysis of confusing words is also presented for all feature extraction techniques. Speaker independent system was proposed with five states monophone based hidden Markov model (HMM) using HMM-based tool kit HTK. A Self-created data set of Hindi speech corpus has been used in the experiment. Feature extraction techniques such as linear predictive coding cepstral coefficients (LPCCs), mel frequency cepstral coefficients (MFCCs), and perceptual linear prediction coefficients (PLPs) were applied using delta, double delta, and energy parameters to evaluate the performance of the proposed methodology. The system was assessed by using different feature extraction techniques for speaker-independent mode. Research findings reveal that PLP coefficients show the highest recognition score, while LPCCs got the lowest recognition scores.Investigations also reveal that both PLP and MFCC coefficients are better than LPCC in speech recognition. Comparative analysis of confusing words shows that PLPs and MFCCs show fewer confusions than LPCCs and exhibit mostly the same pattern in the confusion analysis. Research outcomes also reveal that substitution errors are a significant cause of low recognition. It was also found that some words were recognized with individual feature extraction techniques only. Confusion analysis of the words indicates that words which have nasals, liquid, and fricative sound in first place exhibit more confusions. The investigation could improve speech recognition by choosing an appropriate feature extraction method and mixing the various feature extraction methods. The research outcomes can also be utilized to build linguistic resources for improving speech recognition. The results show that the developed recognition framework achieved the highest recognition word accuracy of 76.68% with PLPs for the speaker-independent model. The proposed system was also compared with existing similar work available.</description><subject>Coefficients</subject><subject>Communications Engineering</subject><subject>Comparative analysis</subject><subject>Computer Communication Networks</subject><subject>Confusion</subject><subject>Engineering</subject><subject>Error analysis</subject><subject>Feature extraction</subject><subject>Feature recognition</subject><subject>Linear prediction</subject><subject>Markov chains</subject><subject>Mathematical models</subject><subject>Networks</subject><subject>Pattern analysis</subject><subject>Performance evaluation</subject><subject>Signal,Image and Speech Processing</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Voice recognition</subject><subject>Words (language)</subject><issn>0929-6212</issn><issn>1572-834X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kMFKAzEURYMoWKs_4CrgevQl6Uwmy1JaKxQEreguZDLJNLUmNZlB-_dOW8Gdm_c291y4B6FrArcEgN8lQijnGVCSQUnK_p6gAck5zUo2ejtFAxBUZAUl9BxdpLQG6DFBB-h9ZlTbRYOn321UunXB46XRK-8-O5Pwl2tXeOzVZpdcwsHiSfC2S843-DXEOmEbIn7emp7AT0aHxrtDhfO4XRk8d752eKF806nGXKIzqzbJXP3-IXqZTZeTebZ4vH-YjBeZZkS0WSE4MSVU3OZQ16LQo1ILormp6wIAlLVQ01zAiFcaYKSJqnnFGSGM2CovKzZEN8febQz7Fa1chy72I5KkOWOsYIKWfYoeUzqGlKKxchvdh4o7SUDupcqjVNlLlQepEnqIHaHUh31j4l_1P9QPrdp6qA</recordid><startdate>20210601</startdate><enddate>20210601</enddate><creator>Bhatt, Shobha</creator><creator>Jain, Anurag</creator><creator>Dev, Amita</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20210601</creationdate><title>Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language</title><author>Bhatt, Shobha ; Jain, Anurag ; Dev, Amita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-6971e80b7f50dd96c48c91c7edd6000aff0d259047bc004c1ad7b731131fb58b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Coefficients</topic><topic>Communications Engineering</topic><topic>Comparative analysis</topic><topic>Computer Communication Networks</topic><topic>Confusion</topic><topic>Engineering</topic><topic>Error analysis</topic><topic>Feature extraction</topic><topic>Feature recognition</topic><topic>Linear prediction</topic><topic>Markov chains</topic><topic>Mathematical models</topic><topic>Networks</topic><topic>Pattern analysis</topic><topic>Performance evaluation</topic><topic>Signal,Image and Speech Processing</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Voice recognition</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bhatt, Shobha</creatorcontrib><creatorcontrib>Jain, Anurag</creatorcontrib><creatorcontrib>Dev, Amita</creatorcontrib><collection>CrossRef</collection><jtitle>Wireless personal communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bhatt, Shobha</au><au>Jain, Anurag</au><au>Dev, Amita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language</atitle><jtitle>Wireless personal communications</jtitle><stitle>Wireless Pers Commun</stitle><date>2021-06-01</date><risdate>2021</risdate><volume>118</volume><issue>4</issue><spage>3303</spage><epage>3333</epage><pages>3303-3333</pages><issn>0929-6212</issn><eissn>1572-834X</eissn><abstract>The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the speech recognition errors. Based on the error analysis, different feature extraction techniques, classification techniques, acoustic models, and pronunciation dictionaries can be selected to improve the speech recognition system's performance. Earlier studies for Hindi speech recognition lack detailed comparative analysis of confusing words for different feature extractions methods. As speaker-independent systems are developed for all, comparative analysis of confusing words is also presented for all feature extraction techniques. Speaker independent system was proposed with five states monophone based hidden Markov model (HMM) using HMM-based tool kit HTK. A Self-created data set of Hindi speech corpus has been used in the experiment. Feature extraction techniques such as linear predictive coding cepstral coefficients (LPCCs), mel frequency cepstral coefficients (MFCCs), and perceptual linear prediction coefficients (PLPs) were applied using delta, double delta, and energy parameters to evaluate the performance of the proposed methodology. The system was assessed by using different feature extraction techniques for speaker-independent mode. Research findings reveal that PLP coefficients show the highest recognition score, while LPCCs got the lowest recognition scores.Investigations also reveal that both PLP and MFCC coefficients are better than LPCC in speech recognition. Comparative analysis of confusing words shows that PLPs and MFCCs show fewer confusions than LPCCs and exhibit mostly the same pattern in the confusion analysis. Research outcomes also reveal that substitution errors are a significant cause of low recognition. It was also found that some words were recognized with individual feature extraction techniques only. Confusion analysis of the words indicates that words which have nasals, liquid, and fricative sound in first place exhibit more confusions. The investigation could improve speech recognition by choosing an appropriate feature extraction method and mixing the various feature extraction methods. The research outcomes can also be utilized to build linguistic resources for improving speech recognition. The results show that the developed recognition framework achieved the highest recognition word accuracy of 76.68% with PLPs for the speaker-independent model. The proposed system was also compared with existing similar work available.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11277-021-08181-0</doi><tpages>31</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0929-6212
ispartof Wireless personal communications, 2021-06, Vol.118 (4), p.3303-3333
issn 0929-6212
1572-834X
language eng
recordid cdi_proquest_journals_2533363928
source SpringerLink Journals - AutoHoldings
subjects Coefficients
Communications Engineering
Comparative analysis
Computer Communication Networks
Confusion
Engineering
Error analysis
Feature extraction
Feature recognition
Linear prediction
Markov chains
Mathematical models
Networks
Pattern analysis
Performance evaluation
Signal,Image and Speech Processing
Speech
Speech recognition
Voice recognition
Words (language)
title Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T18%3A15%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Feature%20Extraction%20Techniques%20with%20Analysis%20of%20Confusing%20Words%20for%20Speech%20Recognition%20in%20the%20Hindi%20Language&rft.jtitle=Wireless%20personal%20communications&rft.au=Bhatt,%20Shobha&rft.date=2021-06-01&rft.volume=118&rft.issue=4&rft.spage=3303&rft.epage=3333&rft.pages=3303-3333&rft.issn=0929-6212&rft.eissn=1572-834X&rft_id=info:doi/10.1007/s11277-021-08181-0&rft_dat=%3Cproquest_cross%3E2533363928%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2533363928&rft_id=info:pmid/&rfr_iscdi=true