Multi-language speech synthesis model training method and device

The embodiment of the invention provides a multi-language speech synthesis model training method and device. The method comprises the steps: respectively training a style encoder, a text encoder and a decoder based on Mel spectrum feature tags, sample phoneme sequences and speaker identification tag...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	YAN YONGHONG, ZHANG PENGYUAN, SHANG ZENGQIANG
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The embodiment of the invention provides a multi-language speech synthesis model training method and device. The method comprises the steps: respectively training a style encoder, a text encoder and a decoder based on Mel spectrum feature tags, sample phoneme sequences and speaker identification tags of a sample audio of each sample language to obtain a style encoder, a text encoder and a decoder which can decouple the timbre (represented by a speaker identifier), the style and the text content of the audio, and further by utilizing a speaker identifier tag and a sample phoneme sequence of the sample audio and a style vector of the sample audio output by the trained style encoder as tags, training a style predictor to obtain a multi-language speech synthesis model. 本说明书实施例提供一种多语言语音合成模型的训练方法及装置，方法包括：基于各样本语言的样本音频的梅尔谱特征标签、样本音素序列、说话人标识标签，分别训练风格编码器、文本编码器以及解码器，以获得可以将音频的音色(通过说话人标识表征)、风格以及文本内容解耦开的风格编码器、文本编码器以及解码器，进而利用样本音频的说话人标识标签及样本音素序列，以及已训练的风格编码器输出的该样本音频的风格向量作为标签，训练风格预测器，以得到多语言语音合成模型。