TEXT-TO-SPEECH SYNTHESIS METHOD AND SYSTEM, AND A METHOD OF TRAINING A TEXT-TO-SPEECH SYNTHESIS SYSTEM
A text-to-speech synthesis method includes receiving text, inputting the received text in a synthesizer that includes a prediction network configured to convert the received text into speech data having a speech attribute that includes emotion, intention, projection, pace, and/or accent, and outputt...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A text-to-speech synthesis method includes receiving text, inputting the received text in a synthesizer that includes a prediction network configured to convert the received text into speech data having a speech attribute that includes emotion, intention, projection, pace, and/or accent, and outputting said speech data. The prediction network is obtained by obtaining a first sub-dataset and a second sub-dataset, where the first sub-dataset and the second sub-dataset each include audio samples and corresponding text, and the speech attribute of the audio samples of the second sub-dataset is more pronounced than the speech attribute of the audio samples of the first sub-dataset, training a first model using the first sub-dataset until a performance metric reaches a first predetermined value, training a second model by further training the first model using the second sub-dataset until the performance metric reaches a second predetermined value, and selecting one trained model as the prediction network. |
---|