Cross-language end-to-end emotional speech synthesis method and system

The invention relates to the field of intelligent digital signal processing, in particular to a cross-language end-to-end emotion speech synthesis method and system. By adopting the method to train a deep neural network model, a natural and smooth A-language target speaker voice with good emotion ex...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	HUA HUA, WANG LI, LI TA, ZHANG PENGYUAN, SHANG ZENGQIANG
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention relates to the field of intelligent digital signal processing, in particular to a cross-language end-to-end emotion speech synthesis method and system. By adopting the method to train a deep neural network model, a natural and smooth A-language target speaker voice with good emotion expression can be synthesized by giving a to-be-synthesized A-language text and a B-language reference voice with emotion. The method specifically comprises the following steps: acquiring voice-text paired original training data, extracting voice frequency domain features, discretely encoding texts, extracting language-independent emotion embedded codes, constructing a complete end-to-end emotion voice synthesis model, and performing supervised training. The speech synthesis model comprises an emotion text fusion coding module, a target time length prediction module, a posterior coding module, an audio decoding module and a discrimination module. After the speech synthesis model is trained to converge, the required e