Mongolian emotional speech synthesis method based on pre-training model and improved Tacotron2 model

A Mongolian emotional speech synthesis method based on a pre-training model and an improved Tacotron2 model comprises the following steps: performing letter-to-phoneme conversion on a Mongolian text to obtain phoneme sequence data; extracting a Mel spectrum from the Mongolian emotion audio; inputtin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	ZHOU CHAO, REN-QING DAOERJI, JI YATU, LI LEIXIAO, SHI BAO
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A Mongolian emotional speech synthesis method based on a pre-training model and an improved Tacotron2 model comprises the following steps: performing letter-to-phoneme conversion on a Mongolian text to obtain phoneme sequence data; extracting a Mel spectrum from the Mongolian emotion audio; inputting the phoneme sequence data and the Mel spectrum into a voice and text combined pre-training model, and training alignment information of the voice and the text; a text analysis module and an emotion analysis module are added into the Tacotron2 acoustic model to obtain an improved model, and Mongolian emotion speech synthesis is realized by taking the improved model as a generator; the generator takes the output of the voice text combined pre-training model as the input and outputs a Mel spectrum; and inputting the Mel spectrum into a vocoder, and converting the acoustic features into voice waveforms to complete Mongolian emotional voice synthesis. The emotion voice can be directly synthesized from the characters,