EE-TTS: Emphatic Expressive TTS with Linguistic Information
While Current TTS systems perform well in synthesizing high-quality speech, producing highly expressive speech remains a challenge. Emphasis, as a critical factor in determining the expressiveness of speech, has attracted more attention nowadays. Previous works usually enhance the emphasis by adding...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While Current TTS systems perform well in synthesizing high-quality speech,
producing highly expressive speech remains a challenge. Emphasis, as a critical
factor in determining the expressiveness of speech, has attracted more
attention nowadays. Previous works usually enhance the emphasis by adding
intermediate features, but they can not guarantee the overall expressiveness of
the speech. To resolve this matter, we propose Emphatic Expressive TTS
(EE-TTS), which leverages multi-level linguistic information from syntax and
semantics. EE-TTS contains an emphasis predictor that can identify appropriate
emphasis positions from text and a conditioned acoustic model to synthesize
expressive speech with emphasis and linguistic information. Experimental
results indicate that EE-TTS outperforms baseline with MOS improvements of 0.49
and 0.67 in expressiveness and naturalness. EE-TTS also shows strong
generalization across different datasets according to AB test results. |
---|---|
DOI: | 10.48550/arxiv.2305.12107 |