Self-Supervised Pre-Trained Speech Representation Based End-to-End Mispronunciation Detection and Diagnosis of Mandarin

Mispronunciation Detection and Diagnosis (MDD) is an essential basic technology in Computer-Assisted Pronunciation Training (CAPT) and Computer-Assisted Language Learning (CALL). MDD research in Mandarin is faced with the problem of lack of relevant data, which is a typical low-resource scenario. In...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2022, Vol.10, p.1-1
Hauptverfasser: Shen, Yunfei, Liu, Qingqing, Fan, Zhixing, Liu, Jiajun, Wumaier, Aishan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Mispronunciation Detection and Diagnosis (MDD) is an essential basic technology in Computer-Assisted Pronunciation Training (CAPT) and Computer-Assisted Language Learning (CALL). MDD research in Mandarin is faced with the problem of lack of relevant data, which is a typical low-resource scenario. In recent years, self-supervised pre-trained speech representation has developed rapidly and achieved significant performance improvement in low-resource speech recognition scenarios, making it necessary to be applied to MDD tasks. First, we build a Mandarin MDD dataset called PSC-Reading for the Putonghua Proficiency Test (PSC) passage reading section. Then we extended the end-to-end MDD system based on CTC/Attention hybrid architecture and Transformer architecture, using features extracted from self-supervised pre-training speech representation models such as Wav2Vec 2.0 and WavLM to replace conventional speech features like MFCC and Fbank, and conduct experiments on the PSC-Reading dataset. Experimental results show that, compared with the baseline model CNN-RNN-CTC, our WavLM-based model obtains 20.5% realtive improvement on the F1 score metric.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2022.3212417