MACHINE LEARNING (ML) BASED EMOTION AND VOICE CONVERSION IN AUDIO USING VIRTUAL DOMAIN MIXING AND FAKE PAIR-MASKING
An electronic device and method for machine learning (ML) based emotion and voice conversion in audio using virtual domain mixing and fake pair-masking is disclosed. The electronic device receives a source audio associated with a first user, a reference-speaker audio associated with a second user, a...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | An electronic device and method for machine learning (ML) based emotion and voice conversion in audio using virtual domain mixing and fake pair-masking is disclosed. The electronic device receives a source audio associated with a first user, a reference-speaker audio associated with a second user, and a reference-emotion audio associated with a third user. The electronic device applies a set of ML models to generate a converted audio. The generated converted audio is associated with content of the source audio, an identity of the second user and an emotion of the third user. The electronic device applies each of a source speaker classifier and a source emotion classifier on the converted audio, and re-trains an adversarial model. Based on the re-training, the adversarial model may allow conversion of an input audio to an output audio associated with the identity of the second user and the emotion of the third user. |
---|