Cross-Dialect Adaptation Framework for Constructing Prosodic Models for Chinese Dialect Text-to-Speech Systems

This paper presents an efficient cross-dialect adaptation framework for constructing prosodic models for Chinese dialect text-to-speech systems. In this framework, dialect prosodic models are adapted from an existing Mandarin speaking rate-dependent hierarchical prosodic model. The rationale of the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2018-01, Vol.26 (1), p.108-121
1. Verfasser:	Chiang, Chen-Yu
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation models Chinese dialect cross-dialect adaptation Hidden Markov models Mandarin Noise measurement Pragmatics Si-Xian Hakka SMAP speaking rate Speech Speech processing SR-HPM Taiwan Min Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents an efficient cross-dialect adaptation framework for constructing prosodic models for Chinese dialect text-to-speech systems. In this framework, dialect prosodic models are adapted from an existing Mandarin speaking rate-dependent hierarchical prosodic model. The rationale of the framework is based on the cross-dialectal similarities between Mandarin and other Chinese dialects in terms of syntactic and prosodic structures. Two main problems are addressed in this study: One problem pertains to the use of cross-dialectal similarities in the design and adaptation of the dialect speaking rate-dependent hierarchical prosodic model. The other problem pertains to the data sparseness caused by the insufficiency of an adaptation corpus covering essential linguistic contexts and prosodic events as well as a wide speaking rate range. This problem is solved by employing the structural maximum a posteriori method that hierarchically organizes the dialect speaking rate-dependent hierarchical prosodic model parameters into decision trees to facilitate parameter estimations. The effectiveness of the proposed approach was evaluated by experiments on two Chinese dialects: Min and Hakka. Objective and subjective evaluations demonstrated that the prosodic features generated by the dialect speaking rate-dependent hierarchical prosodic models were quite natural in various speaking rates ranging from 3.3 to 6.7 syllables per second. These results confirm that the proposed cross-dialect adaptation framework is effective and promising.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2017.2762432