SUST TTS Corpus: A phonetically-balanced corpus for Bangla text-to-speech synthesis

This paper presents the Shahjalal University of Science and Technology Text-To-Speech Corpus (SUST TTS Corpus), a phonetically balanced speech corpus for Bangla speech synthesis. Due to the advancement of deep learning techniques, modern speech processing researches such as speech recognition and sp...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Acoustical Science and Technology 2021/11/01, Vol.42(6), pp.326-332
Hauptverfasser:	Ahmad, Arif, Selim, Md. Reza, Iqbal, Md. Zafar, Rahman, M. Shahidur
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustic insulation Bangla TTS Datasets Deep learning Languages Machine learning Merlin TTS Phonetically balanced corpus Speech Speech processing Speech recognition Speech synthesis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents the Shahjalal University of Science and Technology Text-To-Speech Corpus (SUST TTS Corpus), a phonetically balanced speech corpus for Bangla speech synthesis. Due to the advancement of deep learning techniques, modern speech processing researches such as speech recognition and speech synthesis are being conducted in various deep learning methods. Any state-of-the-art neural TTS system needs a large dataset to be trained efficiently. The lack of such datasets for under-resourced languages like Bangla is a major obstacle for developing TTS systems in those languages. To mitigate this problem and accelerate speech synthesis research in Bangla, we have developed a large-scale, phonetically-balanced speech corpus containing more than 30 hours of speech. Our corpus includes 17,357 utterances spoken by a professional voice talent in a sound-proof audio laboratory. We ensure that the corpus contains all possible Bangla phonetic units in sufficient amounts, making it a phonetically-balanced speech corpus. We describe the process of creating the corpus in this paper. We also train a neural Bangla TTS system with our corpus and obtain a synthetic voice which is comparable to the state-of-the-art TTS systems.
ISSN:	1346-3969 1347-5177
DOI:	10.1250/ast.42.326