Cross-language end-to-end emotional speech synthesis method and system

The invention relates to the field of intelligent digital signal processing, in particular to a cross-language end-to-end emotion speech synthesis method and system. By adopting the method to train a deep neural network model, a natural and smooth A-language target speaker voice with good emotion ex...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: HUA HUA, WANG LI, LI TA, ZHANG PENGYUAN, SHANG ZENGQIANG
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to the field of intelligent digital signal processing, in particular to a cross-language end-to-end emotion speech synthesis method and system. By adopting the method to train a deep neural network model, a natural and smooth A-language target speaker voice with good emotion expression can be synthesized by giving a to-be-synthesized A-language text and a B-language reference voice with emotion. The method specifically comprises the following steps: acquiring voice-text paired original training data, extracting voice frequency domain features, discretely encoding texts, extracting language-independent emotion embedded codes, constructing a complete end-to-end emotion voice synthesis model, and performing supervised training. The speech synthesis model comprises an emotion text fusion coding module, a target time length prediction module, a posterior coding module, an audio decoding module and a discrimination module. After the speech synthesis model is trained to converge, the required e