Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection
We compare using a PHOIBLE-based phone mapping method and using phonological features input in transfer learning for TTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) and target languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu, and U...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We compare using a PHOIBLE-based phone mapping method and using phonological
features input in transfer learning for TTS in low-resource languages. We use
diverse source languages (English, Finnish, Hindi, Japanese, and Russian) and
target languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu, and Uzbek) to
test the language-independence of the methods and enhance the findings'
applicability. We use Character Error Rates from automatic speech recognition
and predicted Mean Opinion Scores for evaluation. Results show that both phone
mapping and features input improve the output quality and the latter performs
better, but these effects also depend on the specific language combination. We
also compare the recently-proposed Angular Similarity of Phone Frequencies
(ASPF) with a family tree-based distance measure as a criterion to select
source languages in transfer learning. ASPF proves effective if label-based
phone input is used, while the language distance does not have expected
effects. |
---|---|
DOI: | 10.48550/arxiv.2306.12040 |