Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2
Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for nat...
Gespeichert in:
Veröffentlicht in: | Mathematical problems in engineering 2022-10, Vol.2022, p.1-12 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 12 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | Mathematical problems in engineering |
container_volume | 2022 |
creator | Hassan, Muhammad Ahmed Rehmat, Asim Ghani Khan, Muhammad Usman Yousaf, Muhammad Haroon |
description | Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for native English speakers, their accuracy drops drastically for non-native English accents. Our proposed model covers this limitation for non-native English accents. We fine-tuned the DeepSpeech2 model, pretrained on the native English accent dataset by LibriSpeech. We retrain the model on a subset of the common voice dataset having only South Asian accents using the proposed novel loss function. We experimented with three different layer configurations of model to learn the best features for South Asian accents. Three evaluation parameters, word error rate (WER), match error rate (MER), and word information loss (WIL) were used. The results show that DeepSpeech2 can perform significantly well for South Asian accents if the weights of initial convolutional layers are retained while updating weights of deeper layers in the model (i.e., RNN and fully connected layers). Our model gave WER of 18.08%, which is the minimum error achieved for non-native English accents in comparison with the original model. |
doi_str_mv | 10.1155/2022/6825555 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2725129619</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2725129619</sourcerecordid><originalsourceid>FETCH-LOGICAL-c337t-65b25b35a78e461c659759306f19cb55d42098e071ac24283457ebb57176925d3</originalsourceid><addsrcrecordid>eNp9kF9LwzAUxYMoOKdvfoCAj1qXpL1J-zjmXxgIbgPfSpqla4ZNatIqfnszumfvy71cfvcc7kHompJ7SgFmjDA24zmDWCdoQoGnCdBMnMaZsCyhLP04Rxch7AlhFGg-Qc1r23n3rVtte2wsng-9a2VvFF51WqsGv2vldtb0xlnsarxyQ9_geTAyskodrjbB2B1ee2lDrT1eauntYRPpB627UYddorNafgZ9dexTtHl6XC9ekuXb8-tivkxUmoo-4VAxqFKQItcZp4pDIaBICa9poSqAbcZIkWsiqFQsY3magdBVBYIKXjDYplN0M-rGt74GHfpy7wZvo2XJBAPKCk6LSN2NlPIuBK_rsvOmlf63pKQ8ZFkesiyPWUb8dsQbY7fyx_xP_wHAf3HZ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2725129619</pqid></control><display><type>article</type><title>Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2</title><source>Wiley Online Library Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>Alma/SFX Local Collection</source><creator>Hassan, Muhammad Ahmed ; Rehmat, Asim ; Ghani Khan, Muhammad Usman ; Yousaf, Muhammad Haroon</creator><contributor>Ijaz, Muhammad Fazal ; Muhammad Fazal Ijaz</contributor><creatorcontrib>Hassan, Muhammad Ahmed ; Rehmat, Asim ; Ghani Khan, Muhammad Usman ; Yousaf, Muhammad Haroon ; Ijaz, Muhammad Fazal ; Muhammad Fazal Ijaz</creatorcontrib><description>Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for native English speakers, their accuracy drops drastically for non-native English accents. Our proposed model covers this limitation for non-native English accents. We fine-tuned the DeepSpeech2 model, pretrained on the native English accent dataset by LibriSpeech. We retrain the model on a subset of the common voice dataset having only South Asian accents using the proposed novel loss function. We experimented with three different layer configurations of model to learn the best features for South Asian accents. Three evaluation parameters, word error rate (WER), match error rate (MER), and word information loss (WIL) were used. The results show that DeepSpeech2 can perform significantly well for South Asian accents if the weights of initial convolutional layers are retained while updating weights of deeper layers in the model (i.e., RNN and fully connected layers). Our model gave WER of 18.08%, which is the minimum error achieved for non-native English accents in comparison with the original model.</description><identifier>ISSN: 1024-123X</identifier><identifier>EISSN: 1563-5147</identifier><identifier>DOI: 10.1155/2022/6825555</identifier><language>eng</language><publisher>New York: Hindawi</publisher><subject>Accentuation ; Accuracy ; Automatic speech recognition ; Automation ; Autonomous vehicles ; Computer mediated communication ; Datasets ; Deep learning ; English language ; Error analysis ; Errors ; Human-computer interaction ; Learning transfer ; Machine learning ; Neural networks ; R&D ; Research & development ; Smartphones ; Speaking ; Speech ; Speech recognition ; Voice communication ; Voice recognition ; Words (language)</subject><ispartof>Mathematical problems in engineering, 2022-10, Vol.2022, p.1-12</ispartof><rights>Copyright © 2022 Muhammad Ahmed Hassan et al.</rights><rights>Copyright © 2022 Muhammad Ahmed Hassan et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c337t-65b25b35a78e461c659759306f19cb55d42098e071ac24283457ebb57176925d3</citedby><cites>FETCH-LOGICAL-c337t-65b25b35a78e461c659759306f19cb55d42098e071ac24283457ebb57176925d3</cites><orcidid>0000-0001-6733-2569 ; 0000-0001-8247-7432 ; 0000-0001-8255-1145 ; 0000-0001-6970-9112</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><contributor>Ijaz, Muhammad Fazal</contributor><contributor>Muhammad Fazal Ijaz</contributor><creatorcontrib>Hassan, Muhammad Ahmed</creatorcontrib><creatorcontrib>Rehmat, Asim</creatorcontrib><creatorcontrib>Ghani Khan, Muhammad Usman</creatorcontrib><creatorcontrib>Yousaf, Muhammad Haroon</creatorcontrib><title>Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2</title><title>Mathematical problems in engineering</title><description>Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for native English speakers, their accuracy drops drastically for non-native English accents. Our proposed model covers this limitation for non-native English accents. We fine-tuned the DeepSpeech2 model, pretrained on the native English accent dataset by LibriSpeech. We retrain the model on a subset of the common voice dataset having only South Asian accents using the proposed novel loss function. We experimented with three different layer configurations of model to learn the best features for South Asian accents. Three evaluation parameters, word error rate (WER), match error rate (MER), and word information loss (WIL) were used. The results show that DeepSpeech2 can perform significantly well for South Asian accents if the weights of initial convolutional layers are retained while updating weights of deeper layers in the model (i.e., RNN and fully connected layers). Our model gave WER of 18.08%, which is the minimum error achieved for non-native English accents in comparison with the original model.</description><subject>Accentuation</subject><subject>Accuracy</subject><subject>Automatic speech recognition</subject><subject>Automation</subject><subject>Autonomous vehicles</subject><subject>Computer mediated communication</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>English language</subject><subject>Error analysis</subject><subject>Errors</subject><subject>Human-computer interaction</subject><subject>Learning transfer</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>R&D</subject><subject>Research & development</subject><subject>Smartphones</subject><subject>Speaking</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Voice communication</subject><subject>Voice recognition</subject><subject>Words (language)</subject><issn>1024-123X</issn><issn>1563-5147</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RHX</sourceid><sourceid>BENPR</sourceid><recordid>eNp9kF9LwzAUxYMoOKdvfoCAj1qXpL1J-zjmXxgIbgPfSpqla4ZNatIqfnszumfvy71cfvcc7kHompJ7SgFmjDA24zmDWCdoQoGnCdBMnMaZsCyhLP04Rxch7AlhFGg-Qc1r23n3rVtte2wsng-9a2VvFF51WqsGv2vldtb0xlnsarxyQ9_geTAyskodrjbB2B1ee2lDrT1eauntYRPpB627UYddorNafgZ9dexTtHl6XC9ekuXb8-tivkxUmoo-4VAxqFKQItcZp4pDIaBICa9poSqAbcZIkWsiqFQsY3magdBVBYIKXjDYplN0M-rGt74GHfpy7wZvo2XJBAPKCk6LSN2NlPIuBK_rsvOmlf63pKQ8ZFkesiyPWUb8dsQbY7fyx_xP_wHAf3HZ</recordid><startdate>20221004</startdate><enddate>20221004</enddate><creator>Hassan, Muhammad Ahmed</creator><creator>Rehmat, Asim</creator><creator>Ghani Khan, Muhammad Usman</creator><creator>Yousaf, Muhammad Haroon</creator><general>Hindawi</general><general>Hindawi Limited</general><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><scope>7TB</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CWDGH</scope><scope>DWQXO</scope><scope>FR3</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>KR7</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><orcidid>https://orcid.org/0000-0001-6733-2569</orcidid><orcidid>https://orcid.org/0000-0001-8247-7432</orcidid><orcidid>https://orcid.org/0000-0001-8255-1145</orcidid><orcidid>https://orcid.org/0000-0001-6970-9112</orcidid></search><sort><creationdate>20221004</creationdate><title>Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2</title><author>Hassan, Muhammad Ahmed ; Rehmat, Asim ; Ghani Khan, Muhammad Usman ; Yousaf, Muhammad Haroon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c337t-65b25b35a78e461c659759306f19cb55d42098e071ac24283457ebb57176925d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accentuation</topic><topic>Accuracy</topic><topic>Automatic speech recognition</topic><topic>Automation</topic><topic>Autonomous vehicles</topic><topic>Computer mediated communication</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>English language</topic><topic>Error analysis</topic><topic>Errors</topic><topic>Human-computer interaction</topic><topic>Learning transfer</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>R&D</topic><topic>Research & development</topic><topic>Smartphones</topic><topic>Speaking</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Voice communication</topic><topic>Voice recognition</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hassan, Muhammad Ahmed</creatorcontrib><creatorcontrib>Rehmat, Asim</creatorcontrib><creatorcontrib>Ghani Khan, Muhammad Usman</creatorcontrib><creatorcontrib>Yousaf, Muhammad Haroon</creatorcontrib><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>Middle East & Africa Database</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Civil Engineering Abstracts</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>Mathematical problems in engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hassan, Muhammad Ahmed</au><au>Rehmat, Asim</au><au>Ghani Khan, Muhammad Usman</au><au>Yousaf, Muhammad Haroon</au><au>Ijaz, Muhammad Fazal</au><au>Muhammad Fazal Ijaz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2</atitle><jtitle>Mathematical problems in engineering</jtitle><date>2022-10-04</date><risdate>2022</risdate><volume>2022</volume><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>1024-123X</issn><eissn>1563-5147</eissn><abstract>Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for native English speakers, their accuracy drops drastically for non-native English accents. Our proposed model covers this limitation for non-native English accents. We fine-tuned the DeepSpeech2 model, pretrained on the native English accent dataset by LibriSpeech. We retrain the model on a subset of the common voice dataset having only South Asian accents using the proposed novel loss function. We experimented with three different layer configurations of model to learn the best features for South Asian accents. Three evaluation parameters, word error rate (WER), match error rate (MER), and word information loss (WIL) were used. The results show that DeepSpeech2 can perform significantly well for South Asian accents if the weights of initial convolutional layers are retained while updating weights of deeper layers in the model (i.e., RNN and fully connected layers). Our model gave WER of 18.08%, which is the minimum error achieved for non-native English accents in comparison with the original model.</abstract><cop>New York</cop><pub>Hindawi</pub><doi>10.1155/2022/6825555</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-6733-2569</orcidid><orcidid>https://orcid.org/0000-0001-8247-7432</orcidid><orcidid>https://orcid.org/0000-0001-8255-1145</orcidid><orcidid>https://orcid.org/0000-0001-6970-9112</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1024-123X |
ispartof | Mathematical problems in engineering, 2022-10, Vol.2022, p.1-12 |
issn | 1024-123X 1563-5147 |
language | eng |
recordid | cdi_proquest_journals_2725129619 |
source | Wiley Online Library Open Access; EZB-FREE-00999 freely available EZB journals; Alma/SFX Local Collection |
subjects | Accentuation Accuracy Automatic speech recognition Automation Autonomous vehicles Computer mediated communication Datasets Deep learning English language Error analysis Errors Human-computer interaction Learning transfer Machine learning Neural networks R&D Research & development Smartphones Speaking Speech Speech recognition Voice communication Voice recognition Words (language) |
title | Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2 |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T01%3A51%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improvement%20in%20Automatic%20Speech%20Recognition%20of%20South%20Asian%20Accent%20Using%20Transfer%20Learning%20of%20DeepSpeech2&rft.jtitle=Mathematical%20problems%20in%20engineering&rft.au=Hassan,%20Muhammad%20Ahmed&rft.date=2022-10-04&rft.volume=2022&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=1024-123X&rft.eissn=1563-5147&rft_id=info:doi/10.1155/2022/6825555&rft_dat=%3Cproquest_cross%3E2725129619%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2725129619&rft_id=info:pmid/&rfr_iscdi=true |