Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language
The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the...
Gespeichert in:
Veröffentlicht in: | Wireless personal communications 2021-06, Vol.118 (4), p.3303-3333 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3333 |
---|---|
container_issue | 4 |
container_start_page | 3303 |
container_title | Wireless personal communications |
container_volume | 118 |
creator | Bhatt, Shobha Jain, Anurag Dev, Amita |
description | The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the speech recognition errors. Based on the error analysis, different feature extraction techniques, classification techniques, acoustic models, and pronunciation dictionaries can be selected to improve the speech recognition system's performance. Earlier studies for Hindi speech recognition lack detailed comparative analysis of confusing words for different feature extractions methods. As speaker-independent systems are developed for all, comparative analysis of confusing words is also presented for all feature extraction techniques. Speaker independent system was proposed with five states monophone based hidden Markov model (HMM) using HMM-based tool kit HTK. A Self-created data set of Hindi speech corpus has been used in the experiment. Feature extraction techniques such as linear predictive coding cepstral coefficients (LPCCs), mel frequency cepstral coefficients (MFCCs), and perceptual linear prediction coefficients (PLPs) were applied using delta, double delta, and energy parameters to evaluate the performance of the proposed methodology. The system was assessed by using different feature extraction techniques for speaker-independent mode. Research findings reveal that PLP coefficients show the highest recognition score, while LPCCs got the lowest recognition scores.Investigations also reveal that both PLP and MFCC coefficients are better than LPCC in speech recognition. Comparative analysis of confusing words shows that PLPs and MFCCs show fewer confusions than LPCCs and exhibit mostly the same pattern in the confusion analysis. Research outcomes also reveal that substitution errors are a significant cause of low recognition. It was also found that some words were recognized with individual feature extraction techniques only. Confusion analysis of the words indicates that words which have nasals, liquid, and fricative sound in first place exhibit more confusions. The investigation could improve speech recognition by choosing an appropriate feature extraction method and mixing the various feature extraction methods. The research outcomes can also be utilized to build linguistic resources for improving speech recognition. The results show that the developed re |
doi_str_mv | 10.1007/s11277-021-08181-0 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2533363928</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2533363928</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-6971e80b7f50dd96c48c91c7edd6000aff0d259047bc004c1ad7b731131fb58b3</originalsourceid><addsrcrecordid>eNp9kMFKAzEURYMoWKs_4CrgevQl6Uwmy1JaKxQEreguZDLJNLUmNZlB-_dOW8Gdm_c291y4B6FrArcEgN8lQijnGVCSQUnK_p6gAck5zUo2ejtFAxBUZAUl9BxdpLQG6DFBB-h9ZlTbRYOn321UunXB46XRK-8-O5Pwl2tXeOzVZpdcwsHiSfC2S843-DXEOmEbIn7emp7AT0aHxrtDhfO4XRk8d752eKF806nGXKIzqzbJXP3-IXqZTZeTebZ4vH-YjBeZZkS0WSE4MSVU3OZQ16LQo1ILormp6wIAlLVQ01zAiFcaYKSJqnnFGSGM2CovKzZEN8febQz7Fa1chy72I5KkOWOsYIKWfYoeUzqGlKKxchvdh4o7SUDupcqjVNlLlQepEnqIHaHUh31j4l_1P9QPrdp6qA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2533363928</pqid></control><display><type>article</type><title>Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language</title><source>SpringerLink Journals - AutoHoldings</source><creator>Bhatt, Shobha ; Jain, Anurag ; Dev, Amita</creator><creatorcontrib>Bhatt, Shobha ; Jain, Anurag ; Dev, Amita</creatorcontrib><description>The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the speech recognition errors. Based on the error analysis, different feature extraction techniques, classification techniques, acoustic models, and pronunciation dictionaries can be selected to improve the speech recognition system's performance. Earlier studies for Hindi speech recognition lack detailed comparative analysis of confusing words for different feature extractions methods. As speaker-independent systems are developed for all, comparative analysis of confusing words is also presented for all feature extraction techniques. Speaker independent system was proposed with five states monophone based hidden Markov model (HMM) using HMM-based tool kit HTK. A Self-created data set of Hindi speech corpus has been used in the experiment. Feature extraction techniques such as linear predictive coding cepstral coefficients (LPCCs), mel frequency cepstral coefficients (MFCCs), and perceptual linear prediction coefficients (PLPs) were applied using delta, double delta, and energy parameters to evaluate the performance of the proposed methodology. The system was assessed by using different feature extraction techniques for speaker-independent mode. Research findings reveal that PLP coefficients show the highest recognition score, while LPCCs got the lowest recognition scores.Investigations also reveal that both PLP and MFCC coefficients are better than LPCC in speech recognition. Comparative analysis of confusing words shows that PLPs and MFCCs show fewer confusions than LPCCs and exhibit mostly the same pattern in the confusion analysis. Research outcomes also reveal that substitution errors are a significant cause of low recognition. It was also found that some words were recognized with individual feature extraction techniques only. Confusion analysis of the words indicates that words which have nasals, liquid, and fricative sound in first place exhibit more confusions. The investigation could improve speech recognition by choosing an appropriate feature extraction method and mixing the various feature extraction methods. The research outcomes can also be utilized to build linguistic resources for improving speech recognition. The results show that the developed recognition framework achieved the highest recognition word accuracy of 76.68% with PLPs for the speaker-independent model. The proposed system was also compared with existing similar work available.</description><identifier>ISSN: 0929-6212</identifier><identifier>EISSN: 1572-834X</identifier><identifier>DOI: 10.1007/s11277-021-08181-0</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Coefficients ; Communications Engineering ; Comparative analysis ; Computer Communication Networks ; Confusion ; Engineering ; Error analysis ; Feature extraction ; Feature recognition ; Linear prediction ; Markov chains ; Mathematical models ; Networks ; Pattern analysis ; Performance evaluation ; Signal,Image and Speech Processing ; Speech ; Speech recognition ; Voice recognition ; Words (language)</subject><ispartof>Wireless personal communications, 2021-06, Vol.118 (4), p.3303-3333</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-6971e80b7f50dd96c48c91c7edd6000aff0d259047bc004c1ad7b731131fb58b3</citedby><cites>FETCH-LOGICAL-c319t-6971e80b7f50dd96c48c91c7edd6000aff0d259047bc004c1ad7b731131fb58b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11277-021-08181-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11277-021-08181-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,778,782,27907,27908,41471,42540,51302</link.rule.ids></links><search><creatorcontrib>Bhatt, Shobha</creatorcontrib><creatorcontrib>Jain, Anurag</creatorcontrib><creatorcontrib>Dev, Amita</creatorcontrib><title>Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language</title><title>Wireless personal communications</title><addtitle>Wireless Pers Commun</addtitle><description>The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the speech recognition errors. Based on the error analysis, different feature extraction techniques, classification techniques, acoustic models, and pronunciation dictionaries can be selected to improve the speech recognition system's performance. Earlier studies for Hindi speech recognition lack detailed comparative analysis of confusing words for different feature extractions methods. As speaker-independent systems are developed for all, comparative analysis of confusing words is also presented for all feature extraction techniques. Speaker independent system was proposed with five states monophone based hidden Markov model (HMM) using HMM-based tool kit HTK. A Self-created data set of Hindi speech corpus has been used in the experiment. Feature extraction techniques such as linear predictive coding cepstral coefficients (LPCCs), mel frequency cepstral coefficients (MFCCs), and perceptual linear prediction coefficients (PLPs) were applied using delta, double delta, and energy parameters to evaluate the performance of the proposed methodology. The system was assessed by using different feature extraction techniques for speaker-independent mode. Research findings reveal that PLP coefficients show the highest recognition score, while LPCCs got the lowest recognition scores.Investigations also reveal that both PLP and MFCC coefficients are better than LPCC in speech recognition. Comparative analysis of confusing words shows that PLPs and MFCCs show fewer confusions than LPCCs and exhibit mostly the same pattern in the confusion analysis. Research outcomes also reveal that substitution errors are a significant cause of low recognition. It was also found that some words were recognized with individual feature extraction techniques only. Confusion analysis of the words indicates that words which have nasals, liquid, and fricative sound in first place exhibit more confusions. The investigation could improve speech recognition by choosing an appropriate feature extraction method and mixing the various feature extraction methods. The research outcomes can also be utilized to build linguistic resources for improving speech recognition. The results show that the developed recognition framework achieved the highest recognition word accuracy of 76.68% with PLPs for the speaker-independent model. The proposed system was also compared with existing similar work available.</description><subject>Coefficients</subject><subject>Communications Engineering</subject><subject>Comparative analysis</subject><subject>Computer Communication Networks</subject><subject>Confusion</subject><subject>Engineering</subject><subject>Error analysis</subject><subject>Feature extraction</subject><subject>Feature recognition</subject><subject>Linear prediction</subject><subject>Markov chains</subject><subject>Mathematical models</subject><subject>Networks</subject><subject>Pattern analysis</subject><subject>Performance evaluation</subject><subject>Signal,Image and Speech Processing</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Voice recognition</subject><subject>Words (language)</subject><issn>0929-6212</issn><issn>1572-834X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kMFKAzEURYMoWKs_4CrgevQl6Uwmy1JaKxQEreguZDLJNLUmNZlB-_dOW8Gdm_c291y4B6FrArcEgN8lQijnGVCSQUnK_p6gAck5zUo2ejtFAxBUZAUl9BxdpLQG6DFBB-h9ZlTbRYOn321UunXB46XRK-8-O5Pwl2tXeOzVZpdcwsHiSfC2S843-DXEOmEbIn7emp7AT0aHxrtDhfO4XRk8d752eKF806nGXKIzqzbJXP3-IXqZTZeTebZ4vH-YjBeZZkS0WSE4MSVU3OZQ16LQo1ILormp6wIAlLVQ01zAiFcaYKSJqnnFGSGM2CovKzZEN8febQz7Fa1chy72I5KkOWOsYIKWfYoeUzqGlKKxchvdh4o7SUDupcqjVNlLlQepEnqIHaHUh31j4l_1P9QPrdp6qA</recordid><startdate>20210601</startdate><enddate>20210601</enddate><creator>Bhatt, Shobha</creator><creator>Jain, Anurag</creator><creator>Dev, Amita</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20210601</creationdate><title>Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language</title><author>Bhatt, Shobha ; Jain, Anurag ; Dev, Amita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-6971e80b7f50dd96c48c91c7edd6000aff0d259047bc004c1ad7b731131fb58b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Coefficients</topic><topic>Communications Engineering</topic><topic>Comparative analysis</topic><topic>Computer Communication Networks</topic><topic>Confusion</topic><topic>Engineering</topic><topic>Error analysis</topic><topic>Feature extraction</topic><topic>Feature recognition</topic><topic>Linear prediction</topic><topic>Markov chains</topic><topic>Mathematical models</topic><topic>Networks</topic><topic>Pattern analysis</topic><topic>Performance evaluation</topic><topic>Signal,Image and Speech Processing</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Voice recognition</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bhatt, Shobha</creatorcontrib><creatorcontrib>Jain, Anurag</creatorcontrib><creatorcontrib>Dev, Amita</creatorcontrib><collection>CrossRef</collection><jtitle>Wireless personal communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bhatt, Shobha</au><au>Jain, Anurag</au><au>Dev, Amita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language</atitle><jtitle>Wireless personal communications</jtitle><stitle>Wireless Pers Commun</stitle><date>2021-06-01</date><risdate>2021</risdate><volume>118</volume><issue>4</issue><spage>3303</spage><epage>3333</epage><pages>3303-3333</pages><issn>0929-6212</issn><eissn>1572-834X</eissn><abstract>The research work presents experimental work to build a speaker-independent connected word Hindi speech recognition system using different feature extraction techniques with comparative analysis of confusing words. Comparative analysis of confusing words is essential to understand the reason for the speech recognition errors. Based on the error analysis, different feature extraction techniques, classification techniques, acoustic models, and pronunciation dictionaries can be selected to improve the speech recognition system's performance. Earlier studies for Hindi speech recognition lack detailed comparative analysis of confusing words for different feature extractions methods. As speaker-independent systems are developed for all, comparative analysis of confusing words is also presented for all feature extraction techniques. Speaker independent system was proposed with five states monophone based hidden Markov model (HMM) using HMM-based tool kit HTK. A Self-created data set of Hindi speech corpus has been used in the experiment. Feature extraction techniques such as linear predictive coding cepstral coefficients (LPCCs), mel frequency cepstral coefficients (MFCCs), and perceptual linear prediction coefficients (PLPs) were applied using delta, double delta, and energy parameters to evaluate the performance of the proposed methodology. The system was assessed by using different feature extraction techniques for speaker-independent mode. Research findings reveal that PLP coefficients show the highest recognition score, while LPCCs got the lowest recognition scores.Investigations also reveal that both PLP and MFCC coefficients are better than LPCC in speech recognition. Comparative analysis of confusing words shows that PLPs and MFCCs show fewer confusions than LPCCs and exhibit mostly the same pattern in the confusion analysis. Research outcomes also reveal that substitution errors are a significant cause of low recognition. It was also found that some words were recognized with individual feature extraction techniques only. Confusion analysis of the words indicates that words which have nasals, liquid, and fricative sound in first place exhibit more confusions. The investigation could improve speech recognition by choosing an appropriate feature extraction method and mixing the various feature extraction methods. The research outcomes can also be utilized to build linguistic resources for improving speech recognition. The results show that the developed recognition framework achieved the highest recognition word accuracy of 76.68% with PLPs for the speaker-independent model. The proposed system was also compared with existing similar work available.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11277-021-08181-0</doi><tpages>31</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0929-6212 |
ispartof | Wireless personal communications, 2021-06, Vol.118 (4), p.3303-3333 |
issn | 0929-6212 1572-834X |
language | eng |
recordid | cdi_proquest_journals_2533363928 |
source | SpringerLink Journals - AutoHoldings |
subjects | Coefficients Communications Engineering Comparative analysis Computer Communication Networks Confusion Engineering Error analysis Feature extraction Feature recognition Linear prediction Markov chains Mathematical models Networks Pattern analysis Performance evaluation Signal,Image and Speech Processing Speech Speech recognition Voice recognition Words (language) |
title | Feature Extraction Techniques with Analysis of Confusing Words for Speech Recognition in the Hindi Language |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T18%3A15%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Feature%20Extraction%20Techniques%20with%20Analysis%20of%20Confusing%20Words%20for%20Speech%20Recognition%20in%20the%20Hindi%20Language&rft.jtitle=Wireless%20personal%20communications&rft.au=Bhatt,%20Shobha&rft.date=2021-06-01&rft.volume=118&rft.issue=4&rft.spage=3303&rft.epage=3333&rft.pages=3303-3333&rft.issn=0929-6212&rft.eissn=1572-834X&rft_id=info:doi/10.1007/s11277-021-08181-0&rft_dat=%3Cproquest_cross%3E2533363928%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2533363928&rft_id=info:pmid/&rfr_iscdi=true |