Cross-language end-to-end emotional speech synthesis method and system
The invention relates to the field of intelligent digital signal processing, in particular to a cross-language end-to-end emotion speech synthesis method and system. By adopting the method to train a deep neural network model, a natural and smooth A-language target speaker voice with good emotion ex...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to the field of intelligent digital signal processing, in particular to a cross-language end-to-end emotion speech synthesis method and system. By adopting the method to train a deep neural network model, a natural and smooth A-language target speaker voice with good emotion expression can be synthesized by giving a to-be-synthesized A-language text and a B-language reference voice with emotion. The method specifically comprises the following steps: acquiring voice-text paired original training data, extracting voice frequency domain features, discretely encoding texts, extracting language-independent emotion embedded codes, constructing a complete end-to-end emotion voice synthesis model, and performing supervised training. The speech synthesis model comprises an emotion text fusion coding module, a target time length prediction module, a posterior coding module, an audio decoding module and a discrimination module. After the speech synthesis model is trained to converge, the required e |
---|