JOINT ACOUSTIC AND VISUAL PROCESSING

An approach to joint acoustic and visual processing associates images with corresponding audio signals, for example, for the retrievals of images according to voice queries. A set of paired images and audio signals are processed without requiring transcription, segmentation, or annotation of either...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Glass James R, Harwath David F
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Glass James R
Harwath David F
description An approach to joint acoustic and visual processing associates images with corresponding audio signals, for example, for the retrievals of images according to voice queries. A set of paired images and audio signals are processed without requiring transcription, segmentation, or annotation of either the images or the audio. This processing of the paired images and audio is used to determine parameters of an image processor and an audio processor, with the outputs of these processors being comparable to determine a similarity across acoustic and visual modalities. In some implementations, the image processor and the audio processor make use of deep neural networks. Further embodiments associate parts of images with corresponding parts of audio signals.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2018039859A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2018039859A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2018039859A13</originalsourceid><addsrcrecordid>eNrjZFDx8vf0C1FwdPYPDQ7xdFZw9HNRCPMMDnX0UQgI8nd2DQ729HPnYWBNS8wpTuWF0twMym6uIc4euqkF-fGpxQWJyal5qSXxocFGBoYWBsaWFqaWjobGxKkCABLGJBw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>JOINT ACOUSTIC AND VISUAL PROCESSING</title><source>esp@cenet</source><creator>Glass James R ; Harwath David F</creator><creatorcontrib>Glass James R ; Harwath David F</creatorcontrib><description>An approach to joint acoustic and visual processing associates images with corresponding audio signals, for example, for the retrievals of images according to voice queries. A set of paired images and audio signals are processed without requiring transcription, segmentation, or annotation of either the images or the audio. This processing of the paired images and audio is used to determine parameters of an image processor and an audio processor, with the outputs of these processors being comparable to determine a similarity across acoustic and visual modalities. In some implementations, the image processor and the audio processor make use of deep neural networks. Further embodiments associate parts of images with corresponding parts of audio signals.</description><language>eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2018</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20180208&amp;DB=EPODOC&amp;CC=US&amp;NR=2018039859A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76290</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20180208&amp;DB=EPODOC&amp;CC=US&amp;NR=2018039859A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Glass James R</creatorcontrib><creatorcontrib>Harwath David F</creatorcontrib><title>JOINT ACOUSTIC AND VISUAL PROCESSING</title><description>An approach to joint acoustic and visual processing associates images with corresponding audio signals, for example, for the retrievals of images according to voice queries. A set of paired images and audio signals are processed without requiring transcription, segmentation, or annotation of either the images or the audio. This processing of the paired images and audio is used to determine parameters of an image processor and an audio processor, with the outputs of these processors being comparable to determine a similarity across acoustic and visual modalities. In some implementations, the image processor and the audio processor make use of deep neural networks. Further embodiments associate parts of images with corresponding parts of audio signals.</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2018</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZFDx8vf0C1FwdPYPDQ7xdFZw9HNRCPMMDnX0UQgI8nd2DQ729HPnYWBNS8wpTuWF0twMym6uIc4euqkF-fGpxQWJyal5qSXxocFGBoYWBsaWFqaWjobGxKkCABLGJBw</recordid><startdate>20180208</startdate><enddate>20180208</enddate><creator>Glass James R</creator><creator>Harwath David F</creator><scope>EVB</scope></search><sort><creationdate>20180208</creationdate><title>JOINT ACOUSTIC AND VISUAL PROCESSING</title><author>Glass James R ; Harwath David F</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2018039859A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2018</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>Glass James R</creatorcontrib><creatorcontrib>Harwath David F</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Glass James R</au><au>Harwath David F</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>JOINT ACOUSTIC AND VISUAL PROCESSING</title><date>2018-02-08</date><risdate>2018</risdate><abstract>An approach to joint acoustic and visual processing associates images with corresponding audio signals, for example, for the retrievals of images according to voice queries. A set of paired images and audio signals are processed without requiring transcription, segmentation, or annotation of either the images or the audio. This processing of the paired images and audio is used to determine parameters of an image processor and an audio processor, with the outputs of these processors being comparable to determine a similarity across acoustic and visual modalities. In some implementations, the image processor and the audio processor make use of deep neural networks. Further embodiments associate parts of images with corresponding parts of audio signals.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epo_espacenet_US2018039859A1
source esp@cenet
subjects ACOUSTICS
CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
MUSICAL INSTRUMENTS
PHYSICS
SPEECH ANALYSIS OR SYNTHESIS
SPEECH OR AUDIO CODING OR DECODING
SPEECH OR VOICE PROCESSING
SPEECH RECOGNITION
title JOINT ACOUSTIC AND VISUAL PROCESSING
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T09%3A12%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Glass%20James%20R&rft.date=2018-02-08&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2018039859A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true