MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting
We present MParrotTTS, a unified multilingual, multi-speaker text-to-speech (TTS) synthesis model that can produce high-quality speech. Benefiting from a modularized training paradigm exploiting self-supervised speech representations, MParrotTTS adapts to a new language with minimal supervised data...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present MParrotTTS, a unified multilingual, multi-speaker text-to-speech
(TTS) synthesis model that can produce high-quality speech. Benefiting from a
modularized training paradigm exploiting self-supervised speech
representations, MParrotTTS adapts to a new language with minimal supervised
data and generalizes to languages not seen while training the self-supervised
backbone. Moreover, without training on any bilingual or parallel examples,
MParrotTTS can transfer voices across languages while preserving the
speaker-specific characteristics, e.g., synthesizing fluent Hindi speech using
a French speaker's voice and accent. We present extensive results on six
languages in terms of speech naturalness and speaker similarity in parallel and
cross-lingual synthesis. The proposed model outperforms the state-of-the-art
multilingual TTS models and baselines, using only a small fraction of
supervised training data. Speech samples from our model can be found at
https://paper2438.github.io/tts/ |
---|---|
DOI: | 10.48550/arxiv.2305.11926 |