An Indonesian concatenative speech synthesis system

An Indonesian concatenative speech synthesis system was developed that can (1) differentiate between the pronunciation of [e] and [@] (XSAMPA) by utilizing a pronunciation dictionary and pronunciation rules and (2) automatically create derived words that are then inserted into the pronunciation dict...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of the Acoustical Society of America 2016-10, Vol.140 (4), p.2961-2961
Hauptverfasser:	Hirai, Toshio, Setiawan, Ivan
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	An Indonesian concatenative speech synthesis system was developed that can (1) differentiate between the pronunciation of [e] and [@] (XSAMPA) by utilizing a pronunciation dictionary and pronunciation rules and (2) automatically create derived words that are then inserted into the pronunciation dictionary, which is used in converting text into phonemes. Speech data were recorded from a female native Indonesian speaker. Phrases included proper nouns and text samples from Indonesian textbooks, newspaper articles, and TV scripts, which covered almost all possible Indonesian syllables. There were 4,490 phrases lasting a total of 5.1 hours. The transcription was constructed considering pauses during narration. A forced alignment technique was used to obtain the phoneme boundary of the speech, and each phoneme and its acoustic/linguistic features were added to a database. In the synthesis step, an input text was converted into phonemes. An appropriate segment sequence, which has features similar to the phonemes should have and shows the least concatenative distortion, was selected from the database and was concatenated into a waveform. The synthesized speech was of acceptable quality, but we discovered some problems, such as sound discontinuity. The parameter weights of cost function to evaluate the similarity and the distortion should be optimized further.
ISSN:	0001-4966 1520-8524
DOI:	10.1121/1.4969146