A TEXT-TO-SPEECH SYNTHESIS METHOD AND SYSTEM, AND A METHOD OF TRAINING A TEXT-TO-SPEECH SYNTHESIS SYSTEM

A text-to-speech synthesis method includes receiving text, inputting the received text in a synthesizer that includes a prediction network configured to convert the received text into speech data having a speech attribute that includes emotion, intention, projection, pace, and/or accent, and outputt...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: NOLASCO, Ines Almeida, VAUGHAN, Felix, FLYNN, John, QURESHI, Zeenat
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A text-to-speech synthesis method includes receiving text, inputting the received text in a synthesizer that includes a prediction network configured to convert the received text into speech data having a speech attribute that includes emotion, intention, projection, pace, and/or accent, and outputting said speech data. The prediction network is obtained by obtaining a first sub-dataset and a second sub-dataset, where the first sub-dataset and the second sub-dataset each include audio samples and corresponding text, and the speech attribute of the audio samples of the second sub-dataset is more pronounced than the speech attribute of the audio samples of the first sub-dataset, training a first model using the first sub-dataset until a performance metric reaches a first predetermined value, training a second model by further training the first model using the second sub-dataset until the performance metric reaches a second predetermined value, and selecting one trained model as the prediction network.