Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2

Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for nat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Mathematical problems in engineering 2022-10, Vol.2022, p.1-12
Hauptverfasser: Hassan, Muhammad Ahmed, Rehmat, Asim, Ghani Khan, Muhammad Usman, Yousaf, Muhammad Haroon
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 12
container_issue
container_start_page 1
container_title Mathematical problems in engineering
container_volume 2022
creator Hassan, Muhammad Ahmed
Rehmat, Asim
Ghani Khan, Muhammad Usman
Yousaf, Muhammad Haroon
description Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for native English speakers, their accuracy drops drastically for non-native English accents. Our proposed model covers this limitation for non-native English accents. We fine-tuned the DeepSpeech2 model, pretrained on the native English accent dataset by LibriSpeech. We retrain the model on a subset of the common voice dataset having only South Asian accents using the proposed novel loss function. We experimented with three different layer configurations of model to learn the best features for South Asian accents. Three evaluation parameters, word error rate (WER), match error rate (MER), and word information loss (WIL) were used. The results show that DeepSpeech2 can perform significantly well for South Asian accents if the weights of initial convolutional layers are retained while updating weights of deeper layers in the model (i.e., RNN and fully connected layers). Our model gave WER of 18.08%, which is the minimum error achieved for non-native English accents in comparison with the original model.
doi_str_mv 10.1155/2022/6825555
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2725129619</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2725129619</sourcerecordid><originalsourceid>FETCH-LOGICAL-c337t-65b25b35a78e461c659759306f19cb55d42098e071ac24283457ebb57176925d3</originalsourceid><addsrcrecordid>eNp9kF9LwzAUxYMoOKdvfoCAj1qXpL1J-zjmXxgIbgPfSpqla4ZNatIqfnszumfvy71cfvcc7kHompJ7SgFmjDA24zmDWCdoQoGnCdBMnMaZsCyhLP04Rxch7AlhFGg-Qc1r23n3rVtte2wsng-9a2VvFF51WqsGv2vldtb0xlnsarxyQ9_geTAyskodrjbB2B1ee2lDrT1eauntYRPpB627UYddorNafgZ9dexTtHl6XC9ekuXb8-tivkxUmoo-4VAxqFKQItcZp4pDIaBICa9poSqAbcZIkWsiqFQsY3magdBVBYIKXjDYplN0M-rGt74GHfpy7wZvo2XJBAPKCk6LSN2NlPIuBK_rsvOmlf63pKQ8ZFkesiyPWUb8dsQbY7fyx_xP_wHAf3HZ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2725129619</pqid></control><display><type>article</type><title>Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2</title><source>Wiley Online Library Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>Alma/SFX Local Collection</source><creator>Hassan, Muhammad Ahmed ; Rehmat, Asim ; Ghani Khan, Muhammad Usman ; Yousaf, Muhammad Haroon</creator><contributor>Ijaz, Muhammad Fazal ; Muhammad Fazal Ijaz</contributor><creatorcontrib>Hassan, Muhammad Ahmed ; Rehmat, Asim ; Ghani Khan, Muhammad Usman ; Yousaf, Muhammad Haroon ; Ijaz, Muhammad Fazal ; Muhammad Fazal Ijaz</creatorcontrib><description>Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for native English speakers, their accuracy drops drastically for non-native English accents. Our proposed model covers this limitation for non-native English accents. We fine-tuned the DeepSpeech2 model, pretrained on the native English accent dataset by LibriSpeech. We retrain the model on a subset of the common voice dataset having only South Asian accents using the proposed novel loss function. We experimented with three different layer configurations of model to learn the best features for South Asian accents. Three evaluation parameters, word error rate (WER), match error rate (MER), and word information loss (WIL) were used. The results show that DeepSpeech2 can perform significantly well for South Asian accents if the weights of initial convolutional layers are retained while updating weights of deeper layers in the model (i.e., RNN and fully connected layers). Our model gave WER of 18.08%, which is the minimum error achieved for non-native English accents in comparison with the original model.</description><identifier>ISSN: 1024-123X</identifier><identifier>EISSN: 1563-5147</identifier><identifier>DOI: 10.1155/2022/6825555</identifier><language>eng</language><publisher>New York: Hindawi</publisher><subject>Accentuation ; Accuracy ; Automatic speech recognition ; Automation ; Autonomous vehicles ; Computer mediated communication ; Datasets ; Deep learning ; English language ; Error analysis ; Errors ; Human-computer interaction ; Learning transfer ; Machine learning ; Neural networks ; R&amp;D ; Research &amp; development ; Smartphones ; Speaking ; Speech ; Speech recognition ; Voice communication ; Voice recognition ; Words (language)</subject><ispartof>Mathematical problems in engineering, 2022-10, Vol.2022, p.1-12</ispartof><rights>Copyright © 2022 Muhammad Ahmed Hassan et al.</rights><rights>Copyright © 2022 Muhammad Ahmed Hassan et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c337t-65b25b35a78e461c659759306f19cb55d42098e071ac24283457ebb57176925d3</citedby><cites>FETCH-LOGICAL-c337t-65b25b35a78e461c659759306f19cb55d42098e071ac24283457ebb57176925d3</cites><orcidid>0000-0001-6733-2569 ; 0000-0001-8247-7432 ; 0000-0001-8255-1145 ; 0000-0001-6970-9112</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><contributor>Ijaz, Muhammad Fazal</contributor><contributor>Muhammad Fazal Ijaz</contributor><creatorcontrib>Hassan, Muhammad Ahmed</creatorcontrib><creatorcontrib>Rehmat, Asim</creatorcontrib><creatorcontrib>Ghani Khan, Muhammad Usman</creatorcontrib><creatorcontrib>Yousaf, Muhammad Haroon</creatorcontrib><title>Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2</title><title>Mathematical problems in engineering</title><description>Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for native English speakers, their accuracy drops drastically for non-native English accents. Our proposed model covers this limitation for non-native English accents. We fine-tuned the DeepSpeech2 model, pretrained on the native English accent dataset by LibriSpeech. We retrain the model on a subset of the common voice dataset having only South Asian accents using the proposed novel loss function. We experimented with three different layer configurations of model to learn the best features for South Asian accents. Three evaluation parameters, word error rate (WER), match error rate (MER), and word information loss (WIL) were used. The results show that DeepSpeech2 can perform significantly well for South Asian accents if the weights of initial convolutional layers are retained while updating weights of deeper layers in the model (i.e., RNN and fully connected layers). Our model gave WER of 18.08%, which is the minimum error achieved for non-native English accents in comparison with the original model.</description><subject>Accentuation</subject><subject>Accuracy</subject><subject>Automatic speech recognition</subject><subject>Automation</subject><subject>Autonomous vehicles</subject><subject>Computer mediated communication</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>English language</subject><subject>Error analysis</subject><subject>Errors</subject><subject>Human-computer interaction</subject><subject>Learning transfer</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>R&amp;D</subject><subject>Research &amp; development</subject><subject>Smartphones</subject><subject>Speaking</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Voice communication</subject><subject>Voice recognition</subject><subject>Words (language)</subject><issn>1024-123X</issn><issn>1563-5147</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RHX</sourceid><sourceid>BENPR</sourceid><recordid>eNp9kF9LwzAUxYMoOKdvfoCAj1qXpL1J-zjmXxgIbgPfSpqla4ZNatIqfnszumfvy71cfvcc7kHompJ7SgFmjDA24zmDWCdoQoGnCdBMnMaZsCyhLP04Rxch7AlhFGg-Qc1r23n3rVtte2wsng-9a2VvFF51WqsGv2vldtb0xlnsarxyQ9_geTAyskodrjbB2B1ee2lDrT1eauntYRPpB627UYddorNafgZ9dexTtHl6XC9ekuXb8-tivkxUmoo-4VAxqFKQItcZp4pDIaBICa9poSqAbcZIkWsiqFQsY3magdBVBYIKXjDYplN0M-rGt74GHfpy7wZvo2XJBAPKCk6LSN2NlPIuBK_rsvOmlf63pKQ8ZFkesiyPWUb8dsQbY7fyx_xP_wHAf3HZ</recordid><startdate>20221004</startdate><enddate>20221004</enddate><creator>Hassan, Muhammad Ahmed</creator><creator>Rehmat, Asim</creator><creator>Ghani Khan, Muhammad Usman</creator><creator>Yousaf, Muhammad Haroon</creator><general>Hindawi</general><general>Hindawi Limited</general><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><scope>7TB</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CWDGH</scope><scope>DWQXO</scope><scope>FR3</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>KR7</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><orcidid>https://orcid.org/0000-0001-6733-2569</orcidid><orcidid>https://orcid.org/0000-0001-8247-7432</orcidid><orcidid>https://orcid.org/0000-0001-8255-1145</orcidid><orcidid>https://orcid.org/0000-0001-6970-9112</orcidid></search><sort><creationdate>20221004</creationdate><title>Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2</title><author>Hassan, Muhammad Ahmed ; Rehmat, Asim ; Ghani Khan, Muhammad Usman ; Yousaf, Muhammad Haroon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c337t-65b25b35a78e461c659759306f19cb55d42098e071ac24283457ebb57176925d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accentuation</topic><topic>Accuracy</topic><topic>Automatic speech recognition</topic><topic>Automation</topic><topic>Autonomous vehicles</topic><topic>Computer mediated communication</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>English language</topic><topic>Error analysis</topic><topic>Errors</topic><topic>Human-computer interaction</topic><topic>Learning transfer</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>R&amp;D</topic><topic>Research &amp; development</topic><topic>Smartphones</topic><topic>Speaking</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Voice communication</topic><topic>Voice recognition</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hassan, Muhammad Ahmed</creatorcontrib><creatorcontrib>Rehmat, Asim</creatorcontrib><creatorcontrib>Ghani Khan, Muhammad Usman</creatorcontrib><creatorcontrib>Yousaf, Muhammad Haroon</creatorcontrib><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>Middle East &amp; Africa Database</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Civil Engineering Abstracts</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>Mathematical problems in engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hassan, Muhammad Ahmed</au><au>Rehmat, Asim</au><au>Ghani Khan, Muhammad Usman</au><au>Yousaf, Muhammad Haroon</au><au>Ijaz, Muhammad Fazal</au><au>Muhammad Fazal Ijaz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2</atitle><jtitle>Mathematical problems in engineering</jtitle><date>2022-10-04</date><risdate>2022</risdate><volume>2022</volume><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>1024-123X</issn><eissn>1563-5147</eissn><abstract>Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for native English speakers, their accuracy drops drastically for non-native English accents. Our proposed model covers this limitation for non-native English accents. We fine-tuned the DeepSpeech2 model, pretrained on the native English accent dataset by LibriSpeech. We retrain the model on a subset of the common voice dataset having only South Asian accents using the proposed novel loss function. We experimented with three different layer configurations of model to learn the best features for South Asian accents. Three evaluation parameters, word error rate (WER), match error rate (MER), and word information loss (WIL) were used. The results show that DeepSpeech2 can perform significantly well for South Asian accents if the weights of initial convolutional layers are retained while updating weights of deeper layers in the model (i.e., RNN and fully connected layers). Our model gave WER of 18.08%, which is the minimum error achieved for non-native English accents in comparison with the original model.</abstract><cop>New York</cop><pub>Hindawi</pub><doi>10.1155/2022/6825555</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-6733-2569</orcidid><orcidid>https://orcid.org/0000-0001-8247-7432</orcidid><orcidid>https://orcid.org/0000-0001-8255-1145</orcidid><orcidid>https://orcid.org/0000-0001-6970-9112</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1024-123X
ispartof Mathematical problems in engineering, 2022-10, Vol.2022, p.1-12
issn 1024-123X
1563-5147
language eng
recordid cdi_proquest_journals_2725129619
source Wiley Online Library Open Access; EZB-FREE-00999 freely available EZB journals; Alma/SFX Local Collection
subjects Accentuation
Accuracy
Automatic speech recognition
Automation
Autonomous vehicles
Computer mediated communication
Datasets
Deep learning
English language
Error analysis
Errors
Human-computer interaction
Learning transfer
Machine learning
Neural networks
R&D
Research & development
Smartphones
Speaking
Speech
Speech recognition
Voice communication
Voice recognition
Words (language)
title Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T01%3A51%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improvement%20in%20Automatic%20Speech%20Recognition%20of%20South%20Asian%20Accent%20Using%20Transfer%20Learning%20of%20DeepSpeech2&rft.jtitle=Mathematical%20problems%20in%20engineering&rft.au=Hassan,%20Muhammad%20Ahmed&rft.date=2022-10-04&rft.volume=2022&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=1024-123X&rft.eissn=1563-5147&rft_id=info:doi/10.1155/2022/6825555&rft_dat=%3Cproquest_cross%3E2725129619%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2725129619&rft_id=info:pmid/&rfr_iscdi=true