Multi-language speech synthesis model training method and device

The embodiment of the invention provides a multi-language speech synthesis model training method and device. The method comprises the steps: respectively training a style encoder, a text encoder and a decoder based on Mel spectrum feature tags, sample phoneme sequences and speaker identification tag...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: YAN YONGHONG, ZHANG PENGYUAN, SHANG ZENGQIANG
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The embodiment of the invention provides a multi-language speech synthesis model training method and device. The method comprises the steps: respectively training a style encoder, a text encoder and a decoder based on Mel spectrum feature tags, sample phoneme sequences and speaker identification tags of a sample audio of each sample language to obtain a style encoder, a text encoder and a decoder which can decouple the timbre (represented by a speaker identifier), the style and the text content of the audio, and further by utilizing a speaker identifier tag and a sample phoneme sequence of the sample audio and a style vector of the sample audio output by the trained style encoder as tags, training a style predictor to obtain a multi-language speech synthesis model. 本说明书实施例提供一种多语言语音合成模型的训练方法及装置,方法包括:基于各样本语言的样本音频的梅尔谱特征标签、样本音素序列、说话人标识标签,分别训练风格编码器、文本编码器以及解码器,以获得可以将音频的音色(通过说话人标识表征)、风格以及文本内容解耦开的风格编码器、文本编码器以及解码器,进而利用样本音频的说话人标识标签及样本音素序列,以及已训练的风格编码器输出的该样本音频的风格向量作为标签,训练风格预测器,以得到多语言语音合成模型。