Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System
This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis. Since the mel-cepstral synthesis filter is explicitly embedded in neural waveform models in the proposed system, both voice characteristics and...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper integrates a classic mel-cepstral synthesis filter into a modern
neural speech synthesis system towards end-to-end controllable speech
synthesis. Since the mel-cepstral synthesis filter is explicitly embedded in
neural waveform models in the proposed system, both voice characteristics and
the pitch of synthesized speech are highly controlled via a frequency warping
parameter and fundamental frequency, respectively. We implement the
mel-cepstral synthesis filter as a differentiable and GPU-friendly module to
enable the acoustic and waveform models in the proposed system to be
simultaneously optimized in an end-to-end manner. Experiments show that the
proposed system improves speech quality from a baseline system maintaining
controllability. The core PyTorch modules used in the experiments will be
publicly available on GitHub. |
---|---|
DOI: | 10.48550/arxiv.2211.11222 |