End-to-end voice conversion model and training method and reasoning method thereof
The invention provides an end-to-end voice conversion model and a training method and a reasoning method thereof, the model is based on a conditional variation encoder, an acoustic model and a vocoder are trained together during training, and mismatching of training and reasoning is avoided. A large...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention provides an end-to-end voice conversion model and a training method and a reasoning method thereof, the model is based on a conditional variation encoder, an acoustic model and a vocoder are trained together during training, and mismatching of training and reasoning is avoided. A large-scale pre-training Hubert model is used for extracting content information representation, speaker information in the content representation can be preliminarily stripped, and initial and final information in the content representation is enriched. Speaker information in content information representation is further stripped by using a gradient inversion method, so that tone leakage is avoided. Through the codebook quantization method, the complexity of content representation is simplified, and the timbre stripping capability is improved. Besides, by adopting a model distillation method based on KL divergence, a content extractor with complicated calculation is distilled to a student network with more efficient ca |
---|