Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech

Although recent neural text-to-speech (TTS) systems have achieved high-quality speech synthesis, there are cases where a TTS system generates low-quality speech, mainly caused by limited training data or information loss during knowledge distillation. Therefore, we propose a novel method to improve...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2022, Vol.10, p.52621-52629
Hauptverfasser:	Choi, Yeunju, Jung, Youngmoon, Suh, Youngjoo, Kim, Hoirin
Format:	Artikel
Sprache:	eng
Schlagworte:	Data models Distillation Intelligibility MOS prediction neural TTS perceptual loss Prediction models Predictions Predictive models Speech Speech recognition Speech synthesis Task analysis Training Training data Transformers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!