Expressive TTS Training With Frame and Style Reconstruction Loss

We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system that improves the speech styling at utterance level. One of the key challenges in prosody modeling is the lack of reference that makes explicit modeling difficult. The proposed technique doesn't require prosody...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2021, Vol.29, p.1806-1818
Hauptverfasser:	Liu, Rui, Sisman, Berrak, Gao, Guanglai, Li, Haizhou
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Decoding emotion recognition Expressive speech synthesis frame and style reconstruction loss Hidden Markov models Linguistics Loss measurement Modelling Naturalness Prosody Reconstruction Speech perception Speech recognition Speech styles Stress Styling Synthesis tacotron Task analysis Text-to-speech Training Training data Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!