Anomaly detection with a variational autoencoder for Arabic mispronunciation detection
Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applicat...
Gespeichert in:
Veröffentlicht in: | International journal of speech technology 2024-06, Vol.27 (2), p.413-424 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 424 |
---|---|
container_issue | 2 |
container_start_page | 413 |
container_title | International journal of speech technology |
container_volume | 27 |
creator | Lounis, Meriem Dendani, Bilal Bahi, Halima |
description | Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection. |
doi_str_mv | 10.1007/s10772-024-10113-9 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3083661841</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3083661841</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1859-abc1443b9ccfb3e6e1bb57e12874a585166a279e1286c73366ed5050ccfd338c3</originalsourceid><addsrcrecordid>eNp9kEtLAzEUhYMoWKt_wFXAdTQ3mTxmWYovENyo25DJZDSlndRkRum_N3WE7lzdy-E7h3sPQpdAr4FSdZOBKsUIZRUBCsBJfYRmIIqkAehx2bkGwiqQp-gs5xWltFY1m6G3RR83dr3DrR-8G0Ls8XcYPrDFXzYFuxfsGttxiL53sfUJdzHhRbJNcHgT8jbFfuzdRB5CztFJZ9fZX_zNOXq9u31ZPpCn5_vH5eKJONCiJrZxUFW8qZ3rGu6lh6YRygPTqrJCC5DSMlXvBekU51L6VlBBC95yrh2fo6spt9zxOfo8mFUcUzk5G0514UFXUCg2US7FnJPvzDaFjU07A9Ts-zNTf6b0Z377M3Ux8clUfgz9u0-H6H9cPwNcc7g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3083661841</pqid></control><display><type>article</type><title>Anomaly detection with a variational autoencoder for Arabic mispronunciation detection</title><source>SpringerLink Journals</source><creator>Lounis, Meriem ; Dendani, Bilal ; Bahi, Halima</creator><creatorcontrib>Lounis, Meriem ; Dendani, Bilal ; Bahi, Halima</creatorcontrib><description>Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-024-10113-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accentuation ; Anomalies ; Applications programs ; Arabic language ; Artificial Intelligence ; Artificial neural networks ; Computer assisted language learning ; Deep learning ; Engineering ; Foreign languages ; Language styles ; Machine learning ; Mobile computing ; Neural networks ; Pronunciation accuracy ; Second language learning ; Signal,Image and Speech Processing ; Social media ; Social Sciences ; Software ; Speech errors</subject><ispartof>International journal of speech technology, 2024-06, Vol.27 (2), p.413-424</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1859-abc1443b9ccfb3e6e1bb57e12874a585166a279e1286c73366ed5050ccfd338c3</cites><orcidid>0000-0003-3834-2623 ; 0000-0002-1519-7075</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-024-10113-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-024-10113-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Lounis, Meriem</creatorcontrib><creatorcontrib>Dendani, Bilal</creatorcontrib><creatorcontrib>Bahi, Halima</creatorcontrib><title>Anomaly detection with a variational autoencoder for Arabic mispronunciation detection</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.</description><subject>Accentuation</subject><subject>Anomalies</subject><subject>Applications programs</subject><subject>Arabic language</subject><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Computer assisted language learning</subject><subject>Deep learning</subject><subject>Engineering</subject><subject>Foreign languages</subject><subject>Language styles</subject><subject>Machine learning</subject><subject>Mobile computing</subject><subject>Neural networks</subject><subject>Pronunciation accuracy</subject><subject>Second language learning</subject><subject>Signal,Image and Speech Processing</subject><subject>Social media</subject><subject>Social Sciences</subject><subject>Software</subject><subject>Speech errors</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLAzEUhYMoWKt_wFXAdTQ3mTxmWYovENyo25DJZDSlndRkRum_N3WE7lzdy-E7h3sPQpdAr4FSdZOBKsUIZRUBCsBJfYRmIIqkAehx2bkGwiqQp-gs5xWltFY1m6G3RR83dr3DrR-8G0Ls8XcYPrDFXzYFuxfsGttxiL53sfUJdzHhRbJNcHgT8jbFfuzdRB5CztFJZ9fZX_zNOXq9u31ZPpCn5_vH5eKJONCiJrZxUFW8qZ3rGu6lh6YRygPTqrJCC5DSMlXvBekU51L6VlBBC95yrh2fo6spt9zxOfo8mFUcUzk5G0514UFXUCg2US7FnJPvzDaFjU07A9Ts-zNTf6b0Z377M3Ux8clUfgz9u0-H6H9cPwNcc7g</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Lounis, Meriem</creator><creator>Dendani, Bilal</creator><creator>Bahi, Halima</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><orcidid>https://orcid.org/0000-0003-3834-2623</orcidid><orcidid>https://orcid.org/0000-0002-1519-7075</orcidid></search><sort><creationdate>20240601</creationdate><title>Anomaly detection with a variational autoencoder for Arabic mispronunciation detection</title><author>Lounis, Meriem ; Dendani, Bilal ; Bahi, Halima</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1859-abc1443b9ccfb3e6e1bb57e12874a585166a279e1286c73366ed5050ccfd338c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accentuation</topic><topic>Anomalies</topic><topic>Applications programs</topic><topic>Arabic language</topic><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Computer assisted language learning</topic><topic>Deep learning</topic><topic>Engineering</topic><topic>Foreign languages</topic><topic>Language styles</topic><topic>Machine learning</topic><topic>Mobile computing</topic><topic>Neural networks</topic><topic>Pronunciation accuracy</topic><topic>Second language learning</topic><topic>Signal,Image and Speech Processing</topic><topic>Social media</topic><topic>Social Sciences</topic><topic>Software</topic><topic>Speech errors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lounis, Meriem</creatorcontrib><creatorcontrib>Dendani, Bilal</creatorcontrib><creatorcontrib>Bahi, Halima</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lounis, Meriem</au><au>Dendani, Bilal</au><au>Bahi, Halima</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Anomaly detection with a variational autoencoder for Arabic mispronunciation detection</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2024-06-01</date><risdate>2024</risdate><volume>27</volume><issue>2</issue><spage>413</spage><epage>424</epage><pages>413-424</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-024-10113-9</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-3834-2623</orcidid><orcidid>https://orcid.org/0000-0002-1519-7075</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1381-2416 |
ispartof | International journal of speech technology, 2024-06, Vol.27 (2), p.413-424 |
issn | 1381-2416 1572-8110 |
language | eng |
recordid | cdi_proquest_journals_3083661841 |
source | SpringerLink Journals |
subjects | Accentuation Anomalies Applications programs Arabic language Artificial Intelligence Artificial neural networks Computer assisted language learning Deep learning Engineering Foreign languages Language styles Machine learning Mobile computing Neural networks Pronunciation accuracy Second language learning Signal,Image and Speech Processing Social media Social Sciences Software Speech errors |
title | Anomaly detection with a variational autoencoder for Arabic mispronunciation detection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T08%3A36%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Anomaly%20detection%20with%20a%20variational%20autoencoder%20for%20Arabic%20mispronunciation%20detection&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Lounis,%20Meriem&rft.date=2024-06-01&rft.volume=27&rft.issue=2&rft.spage=413&rft.epage=424&rft.pages=413-424&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-024-10113-9&rft_dat=%3Cproquest_cross%3E3083661841%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3083661841&rft_id=info:pmid/&rfr_iscdi=true |