Anomaly detection with a variational autoencoder for Arabic mispronunciation detection

Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applicat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of speech technology 2024-06, Vol.27 (2), p.413-424
Hauptverfasser:	Lounis, Meriem, Dendani, Bilal, Bahi, Halima
Format:	Artikel
Sprache:	eng
Schlagworte:	Accentuation Anomalies Applications programs Arabic language Artificial Intelligence Artificial neural networks Computer assisted language learning Deep learning Engineering Foreign languages Language styles Machine learning Mobile computing Neural networks Pronunciation accuracy Second language learning Signal,Image and Speech Processing Social media Social Sciences Software Speech errors
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	424
container_issue	2
container_start_page	413
container_title	International journal of speech technology
container_volume	27
creator	Lounis, Meriem Dendani, Bilal Bahi, Halima
description	Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.
doi_str_mv	10.1007/s10772-024-10113-9
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3083661841</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3083661841</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1859-abc1443b9ccfb3e6e1bb57e12874a585166a279e1286c73366ed5050ccfd338c3</originalsourceid><addsrcrecordid>eNp9kEtLAzEUhYMoWKt_wFXAdTQ3mTxmWYovENyo25DJZDSlndRkRum_N3WE7lzdy-E7h3sPQpdAr4FSdZOBKsUIZRUBCsBJfYRmIIqkAehx2bkGwiqQp-gs5xWltFY1m6G3RR83dr3DrR-8G0Ls8XcYPrDFXzYFuxfsGttxiL53sfUJdzHhRbJNcHgT8jbFfuzdRB5CztFJZ9fZX_zNOXq9u31ZPpCn5_vH5eKJONCiJrZxUFW8qZ3rGu6lh6YRygPTqrJCC5DSMlXvBekU51L6VlBBC95yrh2fo6spt9zxOfo8mFUcUzk5G0514UFXUCg2US7FnJPvzDaFjU07A9Ts-zNTf6b0Z377M3Ux8clUfgz9u0-H6H9cPwNcc7g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3083661841</pqid></control><display><type>article</type><title>Anomaly detection with a variational autoencoder for Arabic mispronunciation detection</title><source>SpringerLink Journals</source><creator>Lounis, Meriem ; Dendani, Bilal ; Bahi, Halima</creator><creatorcontrib>Lounis, Meriem ; Dendani, Bilal ; Bahi, Halima</creatorcontrib><description>Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-024-10113-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accentuation ; Anomalies ; Applications programs ; Arabic language ; Artificial Intelligence ; Artificial neural networks ; Computer assisted language learning ; Deep learning ; Engineering ; Foreign languages ; Language styles ; Machine learning ; Mobile computing ; Neural networks ; Pronunciation accuracy ; Second language learning ; Signal,Image and Speech Processing ; Social media ; Social Sciences ; Software ; Speech errors</subject><ispartof>International journal of speech technology, 2024-06, Vol.27 (2), p.413-424</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1859-abc1443b9ccfb3e6e1bb57e12874a585166a279e1286c73366ed5050ccfd338c3</cites><orcidid>0000-0003-3834-2623 ; 0000-0002-1519-7075</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-024-10113-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-024-10113-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Lounis, Meriem</creatorcontrib><creatorcontrib>Dendani, Bilal</creatorcontrib><creatorcontrib>Bahi, Halima</creatorcontrib><title>Anomaly detection with a variational autoencoder for Arabic mispronunciation detection</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.</description><subject>Accentuation</subject><subject>Anomalies</subject><subject>Applications programs</subject><subject>Arabic language</subject><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Computer assisted language learning</subject><subject>Deep learning</subject><subject>Engineering</subject><subject>Foreign languages</subject><subject>Language styles</subject><subject>Machine learning</subject><subject>Mobile computing</subject><subject>Neural networks</subject><subject>Pronunciation accuracy</subject><subject>Second language learning</subject><subject>Signal,Image and Speech Processing</subject><subject>Social media</subject><subject>Social Sciences</subject><subject>Software</subject><subject>Speech errors</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLAzEUhYMoWKt_wFXAdTQ3mTxmWYovENyo25DJZDSlndRkRum_N3WE7lzdy-E7h3sPQpdAr4FSdZOBKsUIZRUBCsBJfYRmIIqkAehx2bkGwiqQp-gs5xWltFY1m6G3RR83dr3DrR-8G0Ls8XcYPrDFXzYFuxfsGttxiL53sfUJdzHhRbJNcHgT8jbFfuzdRB5CztFJZ9fZX_zNOXq9u31ZPpCn5_vH5eKJONCiJrZxUFW8qZ3rGu6lh6YRygPTqrJCC5DSMlXvBekU51L6VlBBC95yrh2fo6spt9zxOfo8mFUcUzk5G0514UFXUCg2US7FnJPvzDaFjU07A9Ts-zNTf6b0Z377M3Ux8clUfgz9u0-H6H9cPwNcc7g</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Lounis, Meriem</creator><creator>Dendani, Bilal</creator><creator>Bahi, Halima</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><orcidid>https://orcid.org/0000-0003-3834-2623</orcidid><orcidid>https://orcid.org/0000-0002-1519-7075</orcidid></search><sort><creationdate>20240601</creationdate><title>Anomaly detection with a variational autoencoder for Arabic mispronunciation detection</title><author>Lounis, Meriem ; Dendani, Bilal ; Bahi, Halima</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1859-abc1443b9ccfb3e6e1bb57e12874a585166a279e1286c73366ed5050ccfd338c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accentuation</topic><topic>Anomalies</topic><topic>Applications programs</topic><topic>Arabic language</topic><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Computer assisted language learning</topic><topic>Deep learning</topic><topic>Engineering</topic><topic>Foreign languages</topic><topic>Language styles</topic><topic>Machine learning</topic><topic>Mobile computing</topic><topic>Neural networks</topic><topic>Pronunciation accuracy</topic><topic>Second language learning</topic><topic>Signal,Image and Speech Processing</topic><topic>Social media</topic><topic>Social Sciences</topic><topic>Software</topic><topic>Speech errors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lounis, Meriem</creatorcontrib><creatorcontrib>Dendani, Bilal</creatorcontrib><creatorcontrib>Bahi, Halima</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lounis, Meriem</au><au>Dendani, Bilal</au><au>Bahi, Halima</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Anomaly detection with a variational autoencoder for Arabic mispronunciation detection</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2024-06-01</date><risdate>2024</risdate><volume>27</volume><issue>2</issue><spage>413</spage><epage>424</epage><pages>413-424</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-024-10113-9</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-3834-2623</orcidid><orcidid>https://orcid.org/0000-0002-1519-7075</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1381-2416
ispartof	International journal of speech technology, 2024-06, Vol.27 (2), p.413-424
issn	1381-2416 1572-8110
language	eng
recordid	cdi_proquest_journals_3083661841
source	SpringerLink Journals
subjects	Accentuation Anomalies Applications programs Arabic language Artificial Intelligence Artificial neural networks Computer assisted language learning Deep learning Engineering Foreign languages Language styles Machine learning Mobile computing Neural networks Pronunciation accuracy Second language learning Signal,Image and Speech Processing Social media Social Sciences Software Speech errors
title	Anomaly detection with a variational autoencoder for Arabic mispronunciation detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T08%3A36%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Anomaly%20detection%20with%20a%20variational%20autoencoder%20for%20Arabic%20mispronunciation%20detection&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Lounis,%20Meriem&rft.date=2024-06-01&rft.volume=27&rft.issue=2&rft.spage=413&rft.epage=424&rft.pages=413-424&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-024-10113-9&rft_dat=%3Cproquest_cross%3E3083661841%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3083661841&rft_id=info:pmid/&rfr_iscdi=true