Self-Supervised Pre-Trained Speech Representation Based End-to-End Mispronunciation Detection and Diagnosis of Mandarin
Mispronunciation Detection and Diagnosis (MDD) is an essential basic technology in Computer-Assisted Pronunciation Training (CAPT) and Computer-Assisted Language Learning (CALL). MDD research in Mandarin is faced with the problem of lack of relevant data, which is a typical low-resource scenario. In...
Gespeichert in:
Veröffentlicht in: | IEEE access 2022, Vol.10, p.1-1 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Mispronunciation Detection and Diagnosis (MDD) is an essential basic technology in Computer-Assisted Pronunciation Training (CAPT) and Computer-Assisted Language Learning (CALL). MDD research in Mandarin is faced with the problem of lack of relevant data, which is a typical low-resource scenario. In recent years, self-supervised pre-trained speech representation has developed rapidly and achieved significant performance improvement in low-resource speech recognition scenarios, making it necessary to be applied to MDD tasks. First, we build a Mandarin MDD dataset called PSC-Reading for the Putonghua Proficiency Test (PSC) passage reading section. Then we extended the end-to-end MDD system based on CTC/Attention hybrid architecture and Transformer architecture, using features extracted from self-supervised pre-training speech representation models such as Wav2Vec 2.0 and WavLM to replace conventional speech features like MFCC and Fbank, and conduct experiments on the PSC-Reading dataset. Experimental results show that, compared with the baseline model CNN-RNN-CTC, our WavLM-based model obtains 20.5% realtive improvement on the F1 score metric. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2022.3212417 |