MACHINE LEARNING (ML) BASED EMOTION AND VOICE CONVERSION IN AUDIO USING VIRTUAL DOMAIN MIXING AND FAKE PAIR-MASKING

An electronic device and method for machine learning (ML) based emotion and voice conversion in audio using virtual domain mixing and fake pair-masking is disclosed. The electronic device receives a source audio associated with a first user, a reference-speaker audio associated with a second user, a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	TAKAHASHI, NAOYA, SINGH, MAYANK KUMAR, SHAH, NIRMESH, ONOE, NAOYUKI
Format:	Patent
Sprache:	eng
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	An electronic device and method for machine learning (ML) based emotion and voice conversion in audio using virtual domain mixing and fake pair-masking is disclosed. The electronic device receives a source audio associated with a first user, a reference-speaker audio associated with a second user, and a reference-emotion audio associated with a third user. The electronic device applies a set of ML models to generate a converted audio. The generated converted audio is associated with content of the source audio, an identity of the second user and an emotion of the third user. The electronic device applies each of a source speaker classifier and a source emotion classifier on the converted audio, and re-trains an adversarial model. Based on the re-training, the adversarial model may allow conversion of an input audio to an output audio associated with the identity of the second user and the emotion of the third user.