Multi-language speech synthesis model training method and device
The embodiment of the invention provides a multi-language speech synthesis model training method and device. The method comprises the steps: respectively training a style encoder, a text encoder and a decoder based on Mel spectrum feature tags, sample phoneme sequences and speaker identification tag...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The embodiment of the invention provides a multi-language speech synthesis model training method and device. The method comprises the steps: respectively training a style encoder, a text encoder and a decoder based on Mel spectrum feature tags, sample phoneme sequences and speaker identification tags of a sample audio of each sample language to obtain a style encoder, a text encoder and a decoder which can decouple the timbre (represented by a speaker identifier), the style and the text content of the audio, and further by utilizing a speaker identifier tag and a sample phoneme sequence of the sample audio and a style vector of the sample audio output by the trained style encoder as tags, training a style predictor to obtain a multi-language speech synthesis model.
本说明书实施例提供一种多语言语音合成模型的训练方法及装置,方法包括:基于各样本语言的样本音频的梅尔谱特征标签、样本音素序列、说话人标识标签,分别训练风格编码器、文本编码器以及解码器,以获得可以将音频的音色(通过说话人标识表征)、风格以及文本内容解耦开的风格编码器、文本编码器以及解码器,进而利用样本音频的说话人标识标签及样本音素序列,以及已训练的风格编码器输出的该样本音频的风格向量作为标签,训练风格预测器,以得到多语言语音合成模型。 |
---|