A TEXT-TO-SPEECH SYNTHESIS METHOD AND SYSTEM, A METHOD OF TRAINING A TEXT-TO-SPEECH SYNTHESIS SYSTEM, AND A METHOD OF CALCULATING AN EXPRESSIVITY SCORE

A method includes receiving text and inputting the received text in a prediction network. The method further includes generating, using the prediction network, speech data. The prediction network comprises a neural network that is trained to generate expressive speech data from text. The neural netw...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: FLYNN, John, QURESHI, Zeenat
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method includes receiving text and inputting the received text in a prediction network. The method further includes generating, using the prediction network, speech data. The prediction network comprises a neural network that is trained to generate expressive speech data from text. The neural network is trained by: receiving a first training dataset comprising audio data and corresponding text data; acquiring a respective expressivity score for each audio sample of the audio data; selecting, from the first training dataset, a first subset of training data based on the respective expressivity scores of the audio data in the first training dataset; generating, for the first subset of training data, prediction audio data for the corresponding text data; and comparing the prediction audio data to the audio data of the first subset of training data.