Change your singer: a transfer learning generative adversarial framework for song to song conversion
Have you ever wondered how a song might sound if performed by a different artist? In this work, we propose SCM-GAN, an end-to-end non-parallel song conversion system powered by generative adversarial and transfer learning that allows users to listen to a selected target singer singing any song. SCM-...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Have you ever wondered how a song might sound if performed by a different
artist? In this work, we propose SCM-GAN, an end-to-end non-parallel song
conversion system powered by generative adversarial and transfer learning that
allows users to listen to a selected target singer singing any song. SCM-GAN
first separates songs into vocals and instrumental music using a U-Net network,
then converts the vocal segments to the target singer using advanced
CycleGAN-VC, before merging the converted vocals with their corresponding
background music. SCM-GAN is first initialized with feature representations
learned from a state-of-the-art voice-to-voice conversion and then trained on a
dataset of non-parallel songs. Furthermore, SCM-GAN is evaluated against a set
of metrics including global variance GV and modulation spectra MS on the 24
Mel-cepstral coefficients (MCEPs). Transfer learning improves the GV by 35% and
the MS by 13% on average. A subjective comparison is conducted to test the user
satisfaction with the quality and the naturalness of the conversion. Results
show above par similarity between SCM-GAN's output and the target (70\% on
average) as well as great naturalness of the converted songs. |
---|---|
DOI: | 10.48550/arxiv.1911.02933 |