Anomaly detection with a variational autoencoder for Arabic mispronunciation detection

Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applicat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of speech technology 2024, Vol.27 (2), p.413-424
Hauptverfasser: Lounis, Meriem, Dendani, Bilal, Bahi, Halima
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self-supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.
ISSN:1381-2416
1572-8110
DOI:10.1007/s10772-024-10113-9