ARTIFICIAL INTELLIGENCE DEVICE FOR ROBUST MULTIMODAL ENCODER FOR PERSON REPRESENTATIONS AND CONTROL METHOD THEREOF

A method for controlling an artificial intelligence (AI) device can include obtaining a video sample of a user and an audio sample of the user, generating, via a neural network, a visual embedding based on the video sample and an audio embedding based on the audio sample, the visual embedding and th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	FASHANDI, Homa, SELVAKUMARASINGAM, Anith
Format:	Patent
Sprache:	eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	FASHANDI, Homa SELVAKUMARASINGAM, Anith
description	A method for controlling an artificial intelligence (AI) device can include obtaining a video sample of a user and an audio sample of the user, generating, via a neural network, a visual embedding based on the video sample and an audio embedding based on the audio sample, the visual embedding and the audio embedding being multi-dimensional vectors, generating, via the neural network, an audio-visual embedding based on a combination of the visual and audio embeddings. The method can further include determining a specific pre-enrolled audio-visual embedding from among pre-enrolled audio-visual embeddings corresponding pre-enrolled users based on a distance away from the audio-visual embedding within a joint audio-visual subspace and verifying the user as the specific pre-enrolled user. Also, the neural network can be trained based on a loss function that uses a plurality of audio-visual embeddings, each including an audio component and a visual component.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2024347065A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2024347065A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2024347065A13</originalsourceid><addsrcrecordid>eNqNyrEKwjAQgOEuDqK-w4GzUNuqc0wuNpDm5HJ1LUXiJFqs749FfACnf_i_efZSLM467ZQHFwS9dycMGsHgxU2xxMB0bKNA03pxDZlJToIM8veekSMFYDwzRgyixFGIoIIBTUGYPDQoNRmQGhnJLrPZrb-PafXrIltbFF1v0vDs0jj01_RI766NRV5UZXXI9zu1Lf9TH1qaOa4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>ARTIFICIAL INTELLIGENCE DEVICE FOR ROBUST MULTIMODAL ENCODER FOR PERSON REPRESENTATIONS AND CONTROL METHOD THEREOF</title><source>esp@cenet</source><creator>FASHANDI, Homa ; SELVAKUMARASINGAM, Anith</creator><creatorcontrib>FASHANDI, Homa ; SELVAKUMARASINGAM, Anith</creatorcontrib><description>A method for controlling an artificial intelligence (AI) device can include obtaining a video sample of a user and an audio sample of the user, generating, via a neural network, a visual embedding based on the video sample and an audio embedding based on the audio sample, the visual embedding and the audio embedding being multi-dimensional vectors, generating, via the neural network, an audio-visual embedding based on a combination of the visual and audio embeddings. The method can further include determining a specific pre-enrolled audio-visual embedding from among pre-enrolled audio-visual embeddings corresponding pre-enrolled users based on a distance away from the audio-visual embedding within a joint audio-visual subspace and verifying the user as the specific pre-enrolled user. Also, the neural network can be trained based on a loss function that uses a plurality of audio-visual embeddings, each including an audio component and a visual component.</description><language>eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTING ; COUNTING ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20241017&DB=EPODOC&CC=US&NR=2024347065A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20241017&DB=EPODOC&CC=US&NR=2024347065A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>FASHANDI, Homa</creatorcontrib><creatorcontrib>SELVAKUMARASINGAM, Anith</creatorcontrib><title>ARTIFICIAL INTELLIGENCE DEVICE FOR ROBUST MULTIMODAL ENCODER FOR PERSON REPRESENTATIONS AND CONTROL METHOD THEREOF</title><description>A method for controlling an artificial intelligence (AI) device can include obtaining a video sample of a user and an audio sample of the user, generating, via a neural network, a visual embedding based on the video sample and an audio embedding based on the audio sample, the visual embedding and the audio embedding being multi-dimensional vectors, generating, via the neural network, an audio-visual embedding based on a combination of the visual and audio embeddings. The method can further include determining a specific pre-enrolled audio-visual embedding from among pre-enrolled audio-visual embeddings corresponding pre-enrolled users based on a distance away from the audio-visual embedding within a joint audio-visual subspace and verifying the user as the specific pre-enrolled user. Also, the neural network can be trained based on a loss function that uses a plurality of audio-visual embeddings, each including an audio component and a visual component.</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNyrEKwjAQgOEuDqK-w4GzUNuqc0wuNpDm5HJ1LUXiJFqs749FfACnf_i_efZSLM467ZQHFwS9dycMGsHgxU2xxMB0bKNA03pxDZlJToIM8veekSMFYDwzRgyixFGIoIIBTUGYPDQoNRmQGhnJLrPZrb-PafXrIltbFF1v0vDs0jj01_RI766NRV5UZXXI9zu1Lf9TH1qaOa4</recordid><startdate>20241017</startdate><enddate>20241017</enddate><creator>FASHANDI, Homa</creator><creator>SELVAKUMARASINGAM, Anith</creator><scope>EVB</scope></search><sort><creationdate>20241017</creationdate><title>ARTIFICIAL INTELLIGENCE DEVICE FOR ROBUST MULTIMODAL ENCODER FOR PERSON REPRESENTATIONS AND CONTROL METHOD THEREOF</title><author>FASHANDI, Homa ; SELVAKUMARASINGAM, Anith</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2024347065A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2024</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>FASHANDI, Homa</creatorcontrib><creatorcontrib>SELVAKUMARASINGAM, Anith</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>FASHANDI, Homa</au><au>SELVAKUMARASINGAM, Anith</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>ARTIFICIAL INTELLIGENCE DEVICE FOR ROBUST MULTIMODAL ENCODER FOR PERSON REPRESENTATIONS AND CONTROL METHOD THEREOF</title><date>2024-10-17</date><risdate>2024</risdate><abstract>A method for controlling an artificial intelligence (AI) device can include obtaining a video sample of a user and an audio sample of the user, generating, via a neural network, a visual embedding based on the video sample and an audio embedding based on the audio sample, the visual embedding and the audio embedding being multi-dimensional vectors, generating, via the neural network, an audio-visual embedding based on a combination of the visual and audio embeddings. The method can further include determining a specific pre-enrolled audio-visual embedding from among pre-enrolled audio-visual embeddings corresponding pre-enrolled users based on a distance away from the audio-visual embedding within a joint audio-visual subspace and verifying the user as the specific pre-enrolled user. Also, the neural network can be trained based on a loss function that uses a plurality of audio-visual embeddings, each including an audio component and a visual component.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng
recordid	cdi_epo_espacenet_US2024347065A1
source	esp@cenet
subjects	ACOUSTICS CALCULATING COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
title	ARTIFICIAL INTELLIGENCE DEVICE FOR ROBUST MULTIMODAL ENCODER FOR PERSON REPRESENTATIONS AND CONTROL METHOD THEREOF
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T16%3A30%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=FASHANDI,%20Homa&rft.date=2024-10-17&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2024347065A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true