Anomaly detection with a variational autoencoder for Arabic mispronunciation detection

Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applicat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of speech technology 2024-06, Vol.27 (2), p.413-424
Hauptverfasser: Lounis, Meriem, Dendani, Bilal, Bahi, Halima
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 424
container_issue 2
container_start_page 413
container_title International journal of speech technology
container_volume 27
creator Lounis, Meriem
Dendani, Bilal
Bahi, Halima
description Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.
doi_str_mv 10.1007/s10772-024-10113-9
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3083661841</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3083661841</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1859-abc1443b9ccfb3e6e1bb57e12874a585166a279e1286c73366ed5050ccfd338c3</originalsourceid><addsrcrecordid>eNp9kEtLAzEUhYMoWKt_wFXAdTQ3mTxmWYovENyo25DJZDSlndRkRum_N3WE7lzdy-E7h3sPQpdAr4FSdZOBKsUIZRUBCsBJfYRmIIqkAehx2bkGwiqQp-gs5xWltFY1m6G3RR83dr3DrR-8G0Ls8XcYPrDFXzYFuxfsGttxiL53sfUJdzHhRbJNcHgT8jbFfuzdRB5CztFJZ9fZX_zNOXq9u31ZPpCn5_vH5eKJONCiJrZxUFW8qZ3rGu6lh6YRygPTqrJCC5DSMlXvBekU51L6VlBBC95yrh2fo6spt9zxOfo8mFUcUzk5G0514UFXUCg2US7FnJPvzDaFjU07A9Ts-zNTf6b0Z377M3Ux8clUfgz9u0-H6H9cPwNcc7g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3083661841</pqid></control><display><type>article</type><title>Anomaly detection with a variational autoencoder for Arabic mispronunciation detection</title><source>SpringerLink Journals</source><creator>Lounis, Meriem ; Dendani, Bilal ; Bahi, Halima</creator><creatorcontrib>Lounis, Meriem ; Dendani, Bilal ; Bahi, Halima</creatorcontrib><description>Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-024-10113-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accentuation ; Anomalies ; Applications programs ; Arabic language ; Artificial Intelligence ; Artificial neural networks ; Computer assisted language learning ; Deep learning ; Engineering ; Foreign languages ; Language styles ; Machine learning ; Mobile computing ; Neural networks ; Pronunciation accuracy ; Second language learning ; Signal,Image and Speech Processing ; Social media ; Social Sciences ; Software ; Speech errors</subject><ispartof>International journal of speech technology, 2024-06, Vol.27 (2), p.413-424</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1859-abc1443b9ccfb3e6e1bb57e12874a585166a279e1286c73366ed5050ccfd338c3</cites><orcidid>0000-0003-3834-2623 ; 0000-0002-1519-7075</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-024-10113-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-024-10113-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Lounis, Meriem</creatorcontrib><creatorcontrib>Dendani, Bilal</creatorcontrib><creatorcontrib>Bahi, Halima</creatorcontrib><title>Anomaly detection with a variational autoencoder for Arabic mispronunciation detection</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.</description><subject>Accentuation</subject><subject>Anomalies</subject><subject>Applications programs</subject><subject>Arabic language</subject><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Computer assisted language learning</subject><subject>Deep learning</subject><subject>Engineering</subject><subject>Foreign languages</subject><subject>Language styles</subject><subject>Machine learning</subject><subject>Mobile computing</subject><subject>Neural networks</subject><subject>Pronunciation accuracy</subject><subject>Second language learning</subject><subject>Signal,Image and Speech Processing</subject><subject>Social media</subject><subject>Social Sciences</subject><subject>Software</subject><subject>Speech errors</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLAzEUhYMoWKt_wFXAdTQ3mTxmWYovENyo25DJZDSlndRkRum_N3WE7lzdy-E7h3sPQpdAr4FSdZOBKsUIZRUBCsBJfYRmIIqkAehx2bkGwiqQp-gs5xWltFY1m6G3RR83dr3DrR-8G0Ls8XcYPrDFXzYFuxfsGttxiL53sfUJdzHhRbJNcHgT8jbFfuzdRB5CztFJZ9fZX_zNOXq9u31ZPpCn5_vH5eKJONCiJrZxUFW8qZ3rGu6lh6YRygPTqrJCC5DSMlXvBekU51L6VlBBC95yrh2fo6spt9zxOfo8mFUcUzk5G0514UFXUCg2US7FnJPvzDaFjU07A9Ts-zNTf6b0Z377M3Ux8clUfgz9u0-H6H9cPwNcc7g</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Lounis, Meriem</creator><creator>Dendani, Bilal</creator><creator>Bahi, Halima</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><orcidid>https://orcid.org/0000-0003-3834-2623</orcidid><orcidid>https://orcid.org/0000-0002-1519-7075</orcidid></search><sort><creationdate>20240601</creationdate><title>Anomaly detection with a variational autoencoder for Arabic mispronunciation detection</title><author>Lounis, Meriem ; Dendani, Bilal ; Bahi, Halima</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1859-abc1443b9ccfb3e6e1bb57e12874a585166a279e1286c73366ed5050ccfd338c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accentuation</topic><topic>Anomalies</topic><topic>Applications programs</topic><topic>Arabic language</topic><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Computer assisted language learning</topic><topic>Deep learning</topic><topic>Engineering</topic><topic>Foreign languages</topic><topic>Language styles</topic><topic>Machine learning</topic><topic>Mobile computing</topic><topic>Neural networks</topic><topic>Pronunciation accuracy</topic><topic>Second language learning</topic><topic>Signal,Image and Speech Processing</topic><topic>Social media</topic><topic>Social Sciences</topic><topic>Software</topic><topic>Speech errors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lounis, Meriem</creatorcontrib><creatorcontrib>Dendani, Bilal</creatorcontrib><creatorcontrib>Bahi, Halima</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lounis, Meriem</au><au>Dendani, Bilal</au><au>Bahi, Halima</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Anomaly detection with a variational autoencoder for Arabic mispronunciation detection</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2024-06-01</date><risdate>2024</risdate><volume>27</volume><issue>2</issue><spage>413</spage><epage>424</epage><pages>413-424</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-024-10113-9</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-3834-2623</orcidid><orcidid>https://orcid.org/0000-0002-1519-7075</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1381-2416
ispartof International journal of speech technology, 2024-06, Vol.27 (2), p.413-424
issn 1381-2416
1572-8110
language eng
recordid cdi_proquest_journals_3083661841
source SpringerLink Journals
subjects Accentuation
Anomalies
Applications programs
Arabic language
Artificial Intelligence
Artificial neural networks
Computer assisted language learning
Deep learning
Engineering
Foreign languages
Language styles
Machine learning
Mobile computing
Neural networks
Pronunciation accuracy
Second language learning
Signal,Image and Speech Processing
Social media
Social Sciences
Software
Speech errors
title Anomaly detection with a variational autoencoder for Arabic mispronunciation detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T08%3A36%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Anomaly%20detection%20with%20a%20variational%20autoencoder%20for%20Arabic%20mispronunciation%20detection&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Lounis,%20Meriem&rft.date=2024-06-01&rft.volume=27&rft.issue=2&rft.spage=413&rft.epage=424&rft.pages=413-424&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-024-10113-9&rft_dat=%3Cproquest_cross%3E3083661841%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3083661841&rft_id=info:pmid/&rfr_iscdi=true