HMM-based Bahasa Indonesia speech synthesis system with hand-segmentation and labeling

In this paper, we compare the naturalness quality of Bahasa Indonesia speech synthesis using festvox automatic- and hand-segmentation and labeling technique to create a speech transcription. First, we developed a 1549 declarative and question sentence phonetically balanced speech corpus uttered by s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of the Acoustical Society of America 2019-10, Vol.146 (4), p.2955-2956
Hauptverfasser: Anggrayni, Elok, Arifianto, Dhany, Sarwono, Joko
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, we compare the naturalness quality of Bahasa Indonesia speech synthesis using festvox automatic- and hand-segmentation and labeling technique to create a speech transcription. First, we developed a 1549 declarative and question sentence phonetically balanced speech corpus uttered by six male and female speakers. We selected 47, 72, 119, 450, 929, and 1379 sentences, respectively for training whilst maintaining the phonetical balance. The objective is to find the least data training for synthesized naturalness evaluation on both automatic- and hand-segmentation and labeling. The evaluation result using the Mel-cepstrum distortion method was 2.9 for hand-segmentation and labeling, 5.36 for automatic with 47 training sentences, respectively which took about 45 minutes to complete. The performance was increased by 2.46 with hand-segmentation and labeling, 4.78 for automatic, with 1379 sentences and about 9 hours of training time. The Mean Opinion Score was 3.98 (hand) and 3.04 for automatic, respectively which is about 18% performance improvement. The automatic segmentation and labeling introduced phoneme boundary errors which may suggest that the necessity to take careful consideration in segmentation and labeling.
ISSN:0001-4966
1520-8524
DOI:10.1121/1.5137265