ARTIFICIAL INTELLIGENCE DEVICE FOR ROBUST MULTIMODAL ENCODER FOR PERSON REPRESENTATIONS AND CONTROL METHOD THEREOF
A method for controlling an artificial intelligence (AI) device can include obtaining a video sample of a user and an audio sample of the user, generating, via a neural network, a visual embedding based on the video sample and an audio embedding based on the audio sample, the visual embedding and th...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method for controlling an artificial intelligence (AI) device can include obtaining a video sample of a user and an audio sample of the user, generating, via a neural network, a visual embedding based on the video sample and an audio embedding based on the audio sample, the visual embedding and the audio embedding being multi-dimensional vectors, generating, via the neural network, an audio-visual embedding based on a combination of the visual and audio embeddings. The method can further include determining a specific pre-enrolled audio-visual embedding from among pre-enrolled audio-visual embeddings corresponding pre-enrolled users based on a distance away from the audio-visual embedding within a joint audio-visual subspace and verifying the user as the specific pre-enrolled user. Also, the neural network can be trained based on a loss function that uses a plurality of audio-visual embeddings, each including an audio component and a visual component. |
---|