VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the vid...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Shillingford, Brendan Gomes de Freitas, Joao Ferdinando Assael, Ioannis Alexandros |
description | Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores. |
format | Patent |
fullrecord | <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2021110831A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2021110831A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2021110831A13</originalsourceid><addsrcrecordid>eNrjZNAP8wwOdfRRCA5wdXX2UAhydfZ39_MM8fT3U3CKVAjw8Pdz9XVVCAhydfF0BonyMLCmJeYUp_JCaW4GZTfXEGcP3dSC_PjU4oLE5NS81JL40GAjAyNDQ0MDC2NDR0Nj4lQBAKrgJvU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION</title><source>esp@cenet</source><creator>Shillingford, Brendan ; Gomes de Freitas, Joao Ferdinando ; Assael, Ioannis Alexandros</creator><creatorcontrib>Shillingford, Brendan ; Gomes de Freitas, Joao Ferdinando ; Assael, Ioannis Alexandros</creatorcontrib><description>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.</description><language>eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; HANDLING RECORD CARRIERS ; IMAGE DATA PROCESSING OR GENERATION, IN GENERAL ; MUSICAL INSTRUMENTS ; PHYSICS ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210415&DB=EPODOC&CC=US&NR=2021110831A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,778,883,25547,76298</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210415&DB=EPODOC&CC=US&NR=2021110831A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Shillingford, Brendan</creatorcontrib><creatorcontrib>Gomes de Freitas, Joao Ferdinando</creatorcontrib><creatorcontrib>Assael, Ioannis Alexandros</creatorcontrib><title>VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION</title><description>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>HANDLING RECORD CARRIERS</subject><subject>IMAGE DATA PROCESSING OR GENERATION, IN GENERAL</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZNAP8wwOdfRRCA5wdXX2UAhydfZ39_MM8fT3U3CKVAjw8Pdz9XVVCAhydfF0BonyMLCmJeYUp_JCaW4GZTfXEGcP3dSC_PjU4oLE5NS81JL40GAjAyNDQ0MDC2NDR0Nj4lQBAKrgJvU</recordid><startdate>20210415</startdate><enddate>20210415</enddate><creator>Shillingford, Brendan</creator><creator>Gomes de Freitas, Joao Ferdinando</creator><creator>Assael, Ioannis Alexandros</creator><scope>EVB</scope></search><sort><creationdate>20210415</creationdate><title>VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION</title><author>Shillingford, Brendan ; Gomes de Freitas, Joao Ferdinando ; Assael, Ioannis Alexandros</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2021110831A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2021</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>HANDLING RECORD CARRIERS</topic><topic>IMAGE DATA PROCESSING OR GENERATION, IN GENERAL</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>Shillingford, Brendan</creatorcontrib><creatorcontrib>Gomes de Freitas, Joao Ferdinando</creatorcontrib><creatorcontrib>Assael, Ioannis Alexandros</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shillingford, Brendan</au><au>Gomes de Freitas, Joao Ferdinando</au><au>Assael, Ioannis Alexandros</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION</title><date>2021-04-15</date><risdate>2021</risdate><abstract>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.</abstract><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | |
ispartof | |
issn | |
language | eng |
recordid | cdi_epo_espacenet_US2021110831A1 |
source | esp@cenet |
subjects | ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING HANDLING RECORD CARRIERS IMAGE DATA PROCESSING OR GENERATION, IN GENERAL MUSICAL INSTRUMENTS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION |
title | VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T23%3A16%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Shillingford,%20Brendan&rft.date=2021-04-15&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2021110831A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |