Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach
The ability to synthesize spoken language from text has greatly facilitated access to digital content with the advances in text-to-speech technology. However, effective TTS development for low-resource languages, such as Central Kurdish (CKB), still faces many challenges due mainly to the lack of li...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The ability to synthesize spoken language from text has greatly facilitated
access to digital content with the advances in text-to-speech technology.
However, effective TTS development for low-resource languages, such as Central
Kurdish (CKB), still faces many challenges due mainly to the lack of linguistic
information and dedicated resources. In this paper, we improve the Kurdish TTS
system based on Tacotron by training the Kurdish WaveGlow vocoder on a 21-hour
central Kurdish speech corpus instead of using a pre-trained English vocoder
WaveGlow. Vocoder training on the target language corpus is required to
accurately and fluently adapt phonetic and prosodic changes in Kurdish
language. The effectiveness of these enhancements is that our model is
significantly better than the baseline system with English pretrained models.
In particular, our adaptive WaveGlow model achieves an impressive MOS of 4.91,
which sets a new benchmark for Kurdish speech synthesis. On one hand, this
study empowers the advanced features of the TTS system for Central Kurdish, and
on the other hand, it opens the doors for other dialects in Kurdish and other
related languages to further develop. |
---|---|
DOI: | 10.48550/arxiv.2409.13734 |