A Bi-directional Attention Based End to End Mispronunciation Detection and Diagnosis of Mandarin

An increasing number of individuals are acquiring proficiency in Mandarin, signifying the growing significance of employing computer-assisted pronunciation training systems for Mandarin learners. One pivotal component within these systems is the technique for identifying and addressing mispronunciat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of physics. Conference series 2023-12, Vol.2670 (1), p.12006
Hauptverfasser: Liu, Qingqing, Wumaier, Aishan, Shen, Yunfei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:An increasing number of individuals are acquiring proficiency in Mandarin, signifying the growing significance of employing computer-assisted pronunciation training systems for Mandarin learners. One pivotal component within these systems is the technique for identifying and addressing mispronunciations, known as Mispronunciation Detection and Diagnosis (MDD). Recently, certain end-to-end techniques have tried to fuse features of prompt text and acoustic features into the model for training and have shown good results. However, previous approaches have fused acoustic features with prompt text features by a simple attention mechanism. In this paper, we posit that the impact of text features varies significantly when mapped to distinct acoustic characteristics. Furthermore, we propose that the prompt text can lead the model towards achieving an integrated text-audio representation, thereby enhancing the inference quality. Hence, this article presents a model aimed at detecting and diagnosing mispronunciations. The model utilizes a bidirectional attention mechanism to integrate acoustic and prompt text features. Good results were achieved by conducting experiments on a self-built dataset of short Mandarin read-aloud texts.
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/2670/1/012006