Comparative analysis of Dysarthric speech recognition: multiple features and robust templates
Research on recognizing the speeches of normal speakers is generally in practice for numerous years. Nevertheless, a complete system for recognizing the speeches of persons with a speech impairment is still under development. In this work, an isolated digit recognition system is developed to recogni...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2022-09, Vol.81 (22), p.31245-31259 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 31259 |
---|---|
container_issue | 22 |
container_start_page | 31245 |
container_title | Multimedia tools and applications |
container_volume | 81 |
creator | Revathi, Arunachalam Nagakrishnan, R. Sasikaladevi, N. |
description | Research on recognizing the speeches of normal speakers is generally in practice for numerous years. Nevertheless, a complete system for recognizing the speeches of persons with a speech impairment is still under development. In this work, an isolated digit recognition system is developed to recognize the speeches of speech-impaired people affected with dysarthria. Since the speeches uttered by the dysarthric speakers are exhibiting erratic behavior, developing a robust speech recognition system would become more challenging. Even manual recognition of their speeches would become futile. This work analyzes the use of multiple features and speech enhancement techniques in implementing a cluster-based speech recognition system for dysarthric speakers. Speech enhancement techniques are used to improve speech intelligibility or reduce the distortion level of their speeches. The system is evaluated using Gamma-tone energy (GFE) features with filters calibrated in different non-linear frequency scales, stock well features, modified group delay cepstrum (MGDFC), speech enhancement techniques, and VQ based classifier. Decision level fusion of all features and speech enhancement techniques has yielded a 4% word error rate (WER) for the speaker with 6% speech intelligibility. Experimental evaluation has provided better results than the subjective assessment of the speeches uttered by dysarthric speakers. The system is also evaluated for the dysarthric speaker with 95% speech intelligibility. WER is 0% for all the digits for the decision level fusion of speech enhancement techniques and GFE features. This system can be utilized as an assistive tool by caretakers of people affected with dysarthria. |
doi_str_mv | 10.1007/s11042-022-12937-6 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2705200621</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2705200621</sourcerecordid><originalsourceid>FETCH-LOGICAL-c249t-95fddc6cea6ed628c8bee43813da0960eea3bbed0d7389acebbd92ee6b5636cc3</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOD7-gKuA6-hN0iatOxmfMOBGlxLS9HamQ9vUJBXm31sdwZ2rexbnO3A_Qi44XHEAfR05h0wwEIJxUUrN1AFZ8FxLprXgh3OWBTCdAz8mJzFuAbjKRbYg70vfjzbY1H4itYPtdrGN1Df0bhdtSJvQOhpHRLehAZ1fD21q_XBD-6lL7dghbdCmKWCc4ZoGX00x0YT92NmE8YwcNbaLeP57T8nbw_3r8omtXh6fl7cr5kRWJlbmTV075dAqrJUoXFEhZrLgsrZQKkC0sqqwhlrLorQOq6ouBaKqciWVc_KUXO53x-A_JozJbP0U5m-iERpyAaAEn1ti33LBxxiwMWNoext2hoP51mj2Gs2s0fxoNGqG5B6Kc3lYY_ib_of6AnrdeJA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2705200621</pqid></control><display><type>article</type><title>Comparative analysis of Dysarthric speech recognition: multiple features and robust templates</title><source>SpringerLink Journals - AutoHoldings</source><creator>Revathi, Arunachalam ; Nagakrishnan, R. ; Sasikaladevi, N.</creator><creatorcontrib>Revathi, Arunachalam ; Nagakrishnan, R. ; Sasikaladevi, N.</creatorcontrib><description>Research on recognizing the speeches of normal speakers is generally in practice for numerous years. Nevertheless, a complete system for recognizing the speeches of persons with a speech impairment is still under development. In this work, an isolated digit recognition system is developed to recognize the speeches of speech-impaired people affected with dysarthria. Since the speeches uttered by the dysarthric speakers are exhibiting erratic behavior, developing a robust speech recognition system would become more challenging. Even manual recognition of their speeches would become futile. This work analyzes the use of multiple features and speech enhancement techniques in implementing a cluster-based speech recognition system for dysarthric speakers. Speech enhancement techniques are used to improve speech intelligibility or reduce the distortion level of their speeches. The system is evaluated using Gamma-tone energy (GFE) features with filters calibrated in different non-linear frequency scales, stock well features, modified group delay cepstrum (MGDFC), speech enhancement techniques, and VQ based classifier. Decision level fusion of all features and speech enhancement techniques has yielded a 4% word error rate (WER) for the speaker with 6% speech intelligibility. Experimental evaluation has provided better results than the subjective assessment of the speeches uttered by dysarthric speakers. The system is also evaluated for the dysarthric speaker with 95% speech intelligibility. WER is 0% for all the digits for the decision level fusion of speech enhancement techniques and GFE features. This system can be utilized as an assistive tool by caretakers of people affected with dysarthria.</description><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-022-12937-6</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Computer Communication Networks ; Computer Science ; Data Structures and Information Theory ; Dysarthria ; Evaluation ; Feature recognition ; Group delay ; Intelligibility ; Multimedia Information Systems ; Robustness ; Special Purpose and Application-Based Systems ; Speech processing ; Speech recognition ; Speeches ; Subjective assessment ; Voice recognition</subject><ispartof>Multimedia tools and applications, 2022-09, Vol.81 (22), p.31245-31259</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c249t-95fddc6cea6ed628c8bee43813da0960eea3bbed0d7389acebbd92ee6b5636cc3</citedby><cites>FETCH-LOGICAL-c249t-95fddc6cea6ed628c8bee43813da0960eea3bbed0d7389acebbd92ee6b5636cc3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-022-12937-6$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-022-12937-6$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27923,27924,41487,42556,51318</link.rule.ids></links><search><creatorcontrib>Revathi, Arunachalam</creatorcontrib><creatorcontrib>Nagakrishnan, R.</creatorcontrib><creatorcontrib>Sasikaladevi, N.</creatorcontrib><title>Comparative analysis of Dysarthric speech recognition: multiple features and robust templates</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>Research on recognizing the speeches of normal speakers is generally in practice for numerous years. Nevertheless, a complete system for recognizing the speeches of persons with a speech impairment is still under development. In this work, an isolated digit recognition system is developed to recognize the speeches of speech-impaired people affected with dysarthria. Since the speeches uttered by the dysarthric speakers are exhibiting erratic behavior, developing a robust speech recognition system would become more challenging. Even manual recognition of their speeches would become futile. This work analyzes the use of multiple features and speech enhancement techniques in implementing a cluster-based speech recognition system for dysarthric speakers. Speech enhancement techniques are used to improve speech intelligibility or reduce the distortion level of their speeches. The system is evaluated using Gamma-tone energy (GFE) features with filters calibrated in different non-linear frequency scales, stock well features, modified group delay cepstrum (MGDFC), speech enhancement techniques, and VQ based classifier. Decision level fusion of all features and speech enhancement techniques has yielded a 4% word error rate (WER) for the speaker with 6% speech intelligibility. Experimental evaluation has provided better results than the subjective assessment of the speeches uttered by dysarthric speakers. The system is also evaluated for the dysarthric speaker with 95% speech intelligibility. WER is 0% for all the digits for the decision level fusion of speech enhancement techniques and GFE features. This system can be utilized as an assistive tool by caretakers of people affected with dysarthria.</description><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Data Structures and Information Theory</subject><subject>Dysarthria</subject><subject>Evaluation</subject><subject>Feature recognition</subject><subject>Group delay</subject><subject>Intelligibility</subject><subject>Multimedia Information Systems</subject><subject>Robustness</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Speeches</subject><subject>Subjective assessment</subject><subject>Voice recognition</subject><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp9kEtLxDAUhYMoOD7-gKuA6-hN0iatOxmfMOBGlxLS9HamQ9vUJBXm31sdwZ2rexbnO3A_Qi44XHEAfR05h0wwEIJxUUrN1AFZ8FxLprXgh3OWBTCdAz8mJzFuAbjKRbYg70vfjzbY1H4itYPtdrGN1Df0bhdtSJvQOhpHRLehAZ1fD21q_XBD-6lL7dghbdCmKWCc4ZoGX00x0YT92NmE8YwcNbaLeP57T8nbw_3r8omtXh6fl7cr5kRWJlbmTV075dAqrJUoXFEhZrLgsrZQKkC0sqqwhlrLorQOq6ouBaKqciWVc_KUXO53x-A_JozJbP0U5m-iERpyAaAEn1ti33LBxxiwMWNoext2hoP51mj2Gs2s0fxoNGqG5B6Kc3lYY_ib_of6AnrdeJA</recordid><startdate>20220901</startdate><enddate>20220901</enddate><creator>Revathi, Arunachalam</creator><creator>Nagakrishnan, R.</creator><creator>Sasikaladevi, N.</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20220901</creationdate><title>Comparative analysis of Dysarthric speech recognition: multiple features and robust templates</title><author>Revathi, Arunachalam ; Nagakrishnan, R. ; Sasikaladevi, N.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c249t-95fddc6cea6ed628c8bee43813da0960eea3bbed0d7389acebbd92ee6b5636cc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Data Structures and Information Theory</topic><topic>Dysarthria</topic><topic>Evaluation</topic><topic>Feature recognition</topic><topic>Group delay</topic><topic>Intelligibility</topic><topic>Multimedia Information Systems</topic><topic>Robustness</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Speeches</topic><topic>Subjective assessment</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Revathi, Arunachalam</creatorcontrib><creatorcontrib>Nagakrishnan, R.</creatorcontrib><creatorcontrib>Sasikaladevi, N.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Revathi, Arunachalam</au><au>Nagakrishnan, R.</au><au>Sasikaladevi, N.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparative analysis of Dysarthric speech recognition: multiple features and robust templates</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2022-09-01</date><risdate>2022</risdate><volume>81</volume><issue>22</issue><spage>31245</spage><epage>31259</epage><pages>31245-31259</pages><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>Research on recognizing the speeches of normal speakers is generally in practice for numerous years. Nevertheless, a complete system for recognizing the speeches of persons with a speech impairment is still under development. In this work, an isolated digit recognition system is developed to recognize the speeches of speech-impaired people affected with dysarthria. Since the speeches uttered by the dysarthric speakers are exhibiting erratic behavior, developing a robust speech recognition system would become more challenging. Even manual recognition of their speeches would become futile. This work analyzes the use of multiple features and speech enhancement techniques in implementing a cluster-based speech recognition system for dysarthric speakers. Speech enhancement techniques are used to improve speech intelligibility or reduce the distortion level of their speeches. The system is evaluated using Gamma-tone energy (GFE) features with filters calibrated in different non-linear frequency scales, stock well features, modified group delay cepstrum (MGDFC), speech enhancement techniques, and VQ based classifier. Decision level fusion of all features and speech enhancement techniques has yielded a 4% word error rate (WER) for the speaker with 6% speech intelligibility. Experimental evaluation has provided better results than the subjective assessment of the speeches uttered by dysarthric speakers. The system is also evaluated for the dysarthric speaker with 95% speech intelligibility. WER is 0% for all the digits for the decision level fusion of speech enhancement techniques and GFE features. This system can be utilized as an assistive tool by caretakers of people affected with dysarthria.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-022-12937-6</doi><tpages>15</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1380-7501 |
ispartof | Multimedia tools and applications, 2022-09, Vol.81 (22), p.31245-31259 |
issn | 1380-7501 1573-7721 |
language | eng |
recordid | cdi_proquest_journals_2705200621 |
source | SpringerLink Journals - AutoHoldings |
subjects | Computer Communication Networks Computer Science Data Structures and Information Theory Dysarthria Evaluation Feature recognition Group delay Intelligibility Multimedia Information Systems Robustness Special Purpose and Application-Based Systems Speech processing Speech recognition Speeches Subjective assessment Voice recognition |
title | Comparative analysis of Dysarthric speech recognition: multiple features and robust templates |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T22%3A34%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparative%20analysis%20of%20Dysarthric%20speech%20recognition:%20multiple%20features%20and%20robust%20templates&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Revathi,%20Arunachalam&rft.date=2022-09-01&rft.volume=81&rft.issue=22&rft.spage=31245&rft.epage=31259&rft.pages=31245-31259&rft.issn=1380-7501&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-022-12937-6&rft_dat=%3Cproquest_cross%3E2705200621%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2705200621&rft_id=info:pmid/&rfr_iscdi=true |