Cross-Dialect Adaptation Framework for Constructing Prosodic Models for Chinese Dialect Text-to-Speech Systems
This paper presents an efficient cross-dialect adaptation framework for constructing prosodic models for Chinese dialect text-to-speech systems. In this framework, dialect prosodic models are adapted from an existing Mandarin speaking rate-dependent hierarchical prosodic model. The rationale of the...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2018-01, Vol.26 (1), p.108-121 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper presents an efficient cross-dialect adaptation framework for constructing prosodic models for Chinese dialect text-to-speech systems. In this framework, dialect prosodic models are adapted from an existing Mandarin speaking rate-dependent hierarchical prosodic model. The rationale of the framework is based on the cross-dialectal similarities between Mandarin and other Chinese dialects in terms of syntactic and prosodic structures. Two main problems are addressed in this study: One problem pertains to the use of cross-dialectal similarities in the design and adaptation of the dialect speaking rate-dependent hierarchical prosodic model. The other problem pertains to the data sparseness caused by the insufficiency of an adaptation corpus covering essential linguistic contexts and prosodic events as well as a wide speaking rate range. This problem is solved by employing the structural maximum a posteriori method that hierarchically organizes the dialect speaking rate-dependent hierarchical prosodic model parameters into decision trees to facilitate parameter estimations. The effectiveness of the proposed approach was evaluated by experiments on two Chinese dialects: Min and Hakka. Objective and subjective evaluations demonstrated that the prosodic features generated by the dialect speaking rate-dependent hierarchical prosodic models were quite natural in various speaking rates ranging from 3.3 to 6.7 syllables per second. These results confirm that the proposed cross-dialect adaptation framework is effective and promising. |
---|---|
ISSN: | 2329-9290 2329-9304 |
DOI: | 10.1109/TASLP.2017.2762432 |