Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis

Speaker adaptation in speech synthesis transforms a source utterance to a target utterance that differs from the source in terms of voice characteristics. In this paper, we employ vocal tract length normalization, which is generally used in speech recognition to remove individual speaker characteris...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Information Science and Engineering 2014-07, Vol.30 (4), p.1149-1166
Hauptverfasser: 高伟勋(Wei-Xun Gao), 曹奇英(Qi-Ying Cao)
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Speaker adaptation in speech synthesis transforms a source utterance to a target utterance that differs from the source in terms of voice characteristics. In this paper, we employ vocal tract length normalization, which is generally used in speech recognition to remove individual speaker characteristics, to speaker adaptation in speech synthesis. We propose a frequency warping approach based on a time-varying bilinear function to reduce the weighted spectral distance between the source speaker and the target speaker. The warped spectra of the source speaker are then converted to line spectrum pairs to train hidden Markov models (HMM). HMMs are further adapted by algorithms based on maximum likelihood linear regression with the target speaker's data. The experimental results show that our frequency warping approach can make the warped spectra of the source speaker closer to the target speaker, and the resultant adapted HMMs perform better than the HMMs trained by unwrapped spectra in terms of synthesized speech naturalness and speaker similarity.
ISSN:1016-2364
DOI:10.6688/JISE.2014.30.4.13