INTELLIGENT MULTI-CAMERA SWITCHING WITH MACHINE LEARNING

Multiple cameras in a conference room, each pointed in a different direction and including a microphone array to perform sound source localization (SSL). The SSL is used in combination with the video image to identify the speaker from among multiple individuals that appear in the video image. Neural...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	CHU, Peter L, BRYAN, David A, KULKARNI, Varun Ajay, YAN, Yong, WANG, Xiangdong, WANG, Jian David, SPEARMAN, John Paul
Format:	Patent
Sprache:	eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTING COUNTING ELECTRIC COMMUNICATION TECHNIQUE ELECTRICITY HANDLING RECORD CARRIERS MUSICAL INSTRUMENTS PHYSICS PICTORIAL COMMUNICATION, e.g. TELEVISION PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	CHU, Peter L BRYAN, David A KULKARNI, Varun Ajay YAN, Yong WANG, Xiangdong WANG, Jian David SPEARMAN, John Paul
description	Multiple cameras in a conference room, each pointed in a different direction and including a microphone array to perform sound source localization (SSL). The SSL is used in combination with the video image to identify the speaker from among multiple individuals that appear in the video image. Neural network or machine learning processing is performed on the identified speaker to determine the quality of the front or facial view of the speaker. The best view of the speaker's face from the various cameras is selected to be provided to the far end. If no view is satisfactory, a default view is selected and that is provided to the far end. The use of the SSL allows selection of the proper individual from a group of individuals in the conference room, so that only the speaker's head is analyzed for the best facial view and then framed for transmission.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2022400216A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2022400216A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2022400216A13</originalsourceid><addsrcrecordid>eNrjZLDw9Atx9fHxdHf1C1HwDfUJ8dR1dvR1DXJUCA73DHH28PRzVwAyPBR8HUEcVwUfV8cgP6AoDwNrWmJOcSovlOZmUHZzBWrQTS3Ij08tLkhMTs1LLYkPDTYyMDIyMTAwMjRzNDQmThUAFVEpag</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>INTELLIGENT MULTI-CAMERA SWITCHING WITH MACHINE LEARNING</title><source>esp@cenet</source><creator>CHU, Peter L ; BRYAN, David A ; KULKARNI, Varun Ajay ; YAN, Yong ; WANG, Xiangdong ; WANG, Jian David ; SPEARMAN, John Paul</creator><creatorcontrib>CHU, Peter L ; BRYAN, David A ; KULKARNI, Varun Ajay ; YAN, Yong ; WANG, Xiangdong ; WANG, Jian David ; SPEARMAN, John Paul</creatorcontrib><description>Multiple cameras in a conference room, each pointed in a different direction and including a microphone array to perform sound source localization (SSL). The SSL is used in combination with the video image to identify the speaker from among multiple individuals that appear in the video image. Neural network or machine learning processing is performed on the identified speaker to determine the quality of the front or facial view of the speaker. The best view of the speaker's face from the various cameras is selected to be provided to the far end. If no view is satisfactory, a default view is selected and that is provided to the far end. The use of the SSL allows selection of the proper individual from a group of individuals in the conference room, so that only the speaker's head is analyzed for the best facial view and then framed for transmission.</description><language>eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC COMMUNICATION TECHNIQUE ; ELECTRICITY ; HANDLING RECORD CARRIERS ; MUSICAL INSTRUMENTS ; PHYSICS ; PICTORIAL COMMUNICATION, e.g. TELEVISION ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2022</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20221215&DB=EPODOC&CC=US&NR=2022400216A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25562,76317</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20221215&DB=EPODOC&CC=US&NR=2022400216A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>CHU, Peter L</creatorcontrib><creatorcontrib>BRYAN, David A</creatorcontrib><creatorcontrib>KULKARNI, Varun Ajay</creatorcontrib><creatorcontrib>YAN, Yong</creatorcontrib><creatorcontrib>WANG, Xiangdong</creatorcontrib><creatorcontrib>WANG, Jian David</creatorcontrib><creatorcontrib>SPEARMAN, John Paul</creatorcontrib><title>INTELLIGENT MULTI-CAMERA SWITCHING WITH MACHINE LEARNING</title><description>Multiple cameras in a conference room, each pointed in a different direction and including a microphone array to perform sound source localization (SSL). The SSL is used in combination with the video image to identify the speaker from among multiple individuals that appear in the video image. Neural network or machine learning processing is performed on the identified speaker to determine the quality of the front or facial view of the speaker. The best view of the speaker's face from the various cameras is selected to be provided to the far end. If no view is satisfactory, a default view is selected and that is provided to the far end. The use of the SSL allows selection of the proper individual from a group of individuals in the conference room, so that only the speaker's head is analyzed for the best facial view and then framed for transmission.</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC COMMUNICATION TECHNIQUE</subject><subject>ELECTRICITY</subject><subject>HANDLING RECORD CARRIERS</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>PICTORIAL COMMUNICATION, e.g. TELEVISION</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2022</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZLDw9Atx9fHxdHf1C1HwDfUJ8dR1dvR1DXJUCA73DHH28PRzVwAyPBR8HUEcVwUfV8cgP6AoDwNrWmJOcSovlOZmUHZzBWrQTS3Ij08tLkhMTs1LLYkPDTYyMDIyMTAwMjRzNDQmThUAFVEpag</recordid><startdate>20221215</startdate><enddate>20221215</enddate><creator>CHU, Peter L</creator><creator>BRYAN, David A</creator><creator>KULKARNI, Varun Ajay</creator><creator>YAN, Yong</creator><creator>WANG, Xiangdong</creator><creator>WANG, Jian David</creator><creator>SPEARMAN, John Paul</creator><scope>EVB</scope></search><sort><creationdate>20221215</creationdate><title>INTELLIGENT MULTI-CAMERA SWITCHING WITH MACHINE LEARNING</title><author>CHU, Peter L ; BRYAN, David A ; KULKARNI, Varun Ajay ; YAN, Yong ; WANG, Xiangdong ; WANG, Jian David ; SPEARMAN, John Paul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2022400216A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2022</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC COMMUNICATION TECHNIQUE</topic><topic>ELECTRICITY</topic><topic>HANDLING RECORD CARRIERS</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>PICTORIAL COMMUNICATION, e.g. TELEVISION</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>CHU, Peter L</creatorcontrib><creatorcontrib>BRYAN, David A</creatorcontrib><creatorcontrib>KULKARNI, Varun Ajay</creatorcontrib><creatorcontrib>YAN, Yong</creatorcontrib><creatorcontrib>WANG, Xiangdong</creatorcontrib><creatorcontrib>WANG, Jian David</creatorcontrib><creatorcontrib>SPEARMAN, John Paul</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>CHU, Peter L</au><au>BRYAN, David A</au><au>KULKARNI, Varun Ajay</au><au>YAN, Yong</au><au>WANG, Xiangdong</au><au>WANG, Jian David</au><au>SPEARMAN, John Paul</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>INTELLIGENT MULTI-CAMERA SWITCHING WITH MACHINE LEARNING</title><date>2022-12-15</date><risdate>2022</risdate><abstract>Multiple cameras in a conference room, each pointed in a different direction and including a microphone array to perform sound source localization (SSL). The SSL is used in combination with the video image to identify the speaker from among multiple individuals that appear in the video image. Neural network or machine learning processing is performed on the identified speaker to determine the quality of the front or facial view of the speaker. The best view of the speaker's face from the various cameras is selected to be provided to the far end. If no view is satisfactory, a default view is selected and that is provided to the far end. The use of the SSL allows selection of the proper individual from a group of individuals in the conference room, so that only the speaker's head is analyzed for the best facial view and then framed for transmission.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng
recordid	cdi_epo_espacenet_US2022400216A1
source	esp@cenet
subjects	ACOUSTICS CALCULATING COMPUTING COUNTING ELECTRIC COMMUNICATION TECHNIQUE ELECTRICITY HANDLING RECORD CARRIERS MUSICAL INSTRUMENTS PHYSICS PICTORIAL COMMUNICATION, e.g. TELEVISION PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
title	INTELLIGENT MULTI-CAMERA SWITCHING WITH MACHINE LEARNING
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T21%3A57%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=CHU,%20Peter%20L&rft.date=2022-12-15&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2022400216A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true