FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder
Generative adversarial network (GAN) based vocoders have achieved significant attention in speech synthesis with high quality and fast inference speed. However, there still exist many noticeable spectral artifacts, resulting in the quality decline of synthesized speech. In this work, we adopt a nove...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Generative adversarial network (GAN) based vocoders have achieved significant
attention in speech synthesis with high quality and fast inference speed.
However, there still exist many noticeable spectral artifacts, resulting in the
quality decline of synthesized speech. In this work, we adopt a novel GAN-based
vocoder designed for few artifacts and high fidelity, called FA-GAN. To
suppress the aliasing artifacts caused by non-ideal upsampling layers in
high-frequency components, we introduce the anti-aliased twin deconvolution
module in the generator. To alleviate blurring artifacts and enrich the
reconstruction of spectral details, we propose a novel fine-grained
multi-resolution real and imaginary loss to assist in the modeling of phase
information. Experimental results reveal that FA-GAN outperforms the compared
approaches in promoting audio quality and alleviating spectral artifacts, and
exhibits superior performance when applied to unseen speaker scenarios. |
---|---|
DOI: | 10.48550/arxiv.2407.04575 |