ARTIFICIAL INTELLIGENCE DEVICE FOR ROBUST MULTIMODAL ENCODER FOR PERSON REPRESENTATIONS AND CONTROL METHOD THEREOF

A method for controlling an artificial intelligence (AI) device can include obtaining a video sample of a user and an audio sample of the user, generating, via a neural network, a visual embedding based on the video sample and an audio embedding based on the audio sample, the visual embedding and th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	FASHANDI, Homa, SELVAKUMARASINGAM, Anith
Format:	Patent
Sprache:	eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A method for controlling an artificial intelligence (AI) device can include obtaining a video sample of a user and an audio sample of the user, generating, via a neural network, a visual embedding based on the video sample and an audio embedding based on the audio sample, the visual embedding and the audio embedding being multi-dimensional vectors, generating, via the neural network, an audio-visual embedding based on a combination of the visual and audio embeddings. The method can further include determining a specific pre-enrolled audio-visual embedding from among pre-enrolled audio-visual embeddings corresponding pre-enrolled users based on a distance away from the audio-visual embedding within a joint audio-visual subspace and verifying the user as the specific pre-enrolled user. Also, the neural network can be trained based on a loss function that uses a plurality of audio-visual embeddings, each including an audio component and a visual component.