Building a Speech Dataset and Recognition Model for the Minority Tu Language

Speech recognition technology has many applications in our daily life. However, for many low-resource languages without written forms, acquiring sufficient training data remains a significant challenge for building accurate ASR models. The Tu language, spoken by an ethnic minority group in Qinghai P...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied sciences 2024-08, Vol.14 (15), p.6795
Hauptverfasser: Kong, Shasha, Li, Chunmei, Fang, Chengwu, Yang, Peng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Speech recognition technology has many applications in our daily life. However, for many low-resource languages without written forms, acquiring sufficient training data remains a significant challenge for building accurate ASR models. The Tu language, spoken by an ethnic minority group in Qinghai Province in China, is one such example. Due to the lack of written records and the great diversity in regional pronunciations, there has been little previous research on Tu-language speech recognition. This work seeks to address this research gap by creating the first speech dataset for the Tu language spoken in Huzhu County, Qinghai. We first formulated the relevant pronunciation rules for the Tu language based on linguistic analysis. Then, we constructed a new speech corpus, named HZ-TuDs, through targeted data collection and annotation. Based on the HZ-TuDs dataset, we designed several baseline sequence-to-sequence deep neural models for end-to-end Tu-language speech recognition. Additionally, we proposed a novel SA-conformer model, which combines convolutional and channel attention modules to better extract speech features. Experiments showed that our proposed SA-conformer model can significantly reduce the character error rate from 23% to 12%, effectively improving the accuracy of Tu language recognition compared to previous approaches. This demonstrates the effectiveness of our dataset construction and model design efforts in advancing speech recognition technology for this low-resource minority language.
ISSN:2076-3417
2076-3417
DOI:10.3390/app14156795