Expressive TTS Training With Frame and Style Reconstruction Loss

We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system that improves the speech styling at utterance level. One of the key challenges in prosody modeling is the lack of reference that makes explicit modeling difficult. The proposed technique doesn't require prosody...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2021, Vol.29, p.1806-1818
Hauptverfasser: Liu, Rui, Sisman, Berrak, Gao, Guanglai, Li, Haizhou
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!