Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis

This paper describes a speaker-adaptive HMM-based speech synthesis system. The new system, called ldquoHTS-2007,rdquo employs speaker adaptation (CSMAPLR+MAP), feature-space adaptive training, mixed-gender modeling, and full-covariance modeling using CSMAPLR transforms, in addition to several other...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2009-08, Vol.17 (6), p.1208-1230
Hauptverfasser:	Yamagishi, J., Nose, T., Zen, H., Zhen-Hua Ling, Toda, T., Tokuda, K., King, S., Renals, S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Average voice Computer science Construction Continuous-stirred tank reactor Councils Exact sciences and technology Hidden Markov models High temperature superconductors HMM Speech Synthesis System HMM-based speech synthesis HTS Information science Information, signal and communications theory Mathematical models Natural language processing Nose Robustness Sentences Signal processing speaker adaptation Speech Speech analysis Speech processing Speech recognition Speech synthesis Telecommunications and information theory Transforms voice conversion
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper describes a speaker-adaptive HMM-based speech synthesis system. The new system, called ldquoHTS-2007,rdquo employs speaker adaptation (CSMAPLR+MAP), feature-space adaptive training, mixed-gender modeling, and full-covariance modeling using CSMAPLR transforms, in addition to several other techniques that have proved effective in our previous systems. Subjective evaluation results show that the new system generates significantly better quality synthetic speech than speaker-dependent approaches with realistic amounts of speech data, and that it bears comparison with speaker-dependent approaches even when large amounts of speech data are available. In addition, a comparison study with several speech synthesis techniques shows the new system is very robust: It is able to build voices from less-than-ideal speech data and synthesize good-quality speech even for out-of-domain sentences.
ISSN:	1558-7916 2329-9290 1558-7924 2329-9304
DOI:	10.1109/TASL.2009.2016394