VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the vid...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Shillingford, Brendan, Gomes de Freitas, Joao Ferdinando, Assael, Ioannis Alexandros
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Shillingford, Brendan
Gomes de Freitas, Joao Ferdinando
Assael, Ioannis Alexandros
description Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2021110831A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2021110831A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2021110831A13</originalsourceid><addsrcrecordid>eNrjZNAP8wwOdfRRCA5wdXX2UAhydfZ39_MM8fT3U3CKVAjw8Pdz9XVVCAhydfF0BonyMLCmJeYUp_JCaW4GZTfXEGcP3dSC_PjU4oLE5NS81JL40GAjAyNDQ0MDC2NDR0Nj4lQBAKrgJvU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION</title><source>esp@cenet</source><creator>Shillingford, Brendan ; Gomes de Freitas, Joao Ferdinando ; Assael, Ioannis Alexandros</creator><creatorcontrib>Shillingford, Brendan ; Gomes de Freitas, Joao Ferdinando ; Assael, Ioannis Alexandros</creatorcontrib><description>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.</description><language>eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; HANDLING RECORD CARRIERS ; IMAGE DATA PROCESSING OR GENERATION, IN GENERAL ; MUSICAL INSTRUMENTS ; PHYSICS ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20210415&amp;DB=EPODOC&amp;CC=US&amp;NR=2021110831A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,778,883,25547,76298</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20210415&amp;DB=EPODOC&amp;CC=US&amp;NR=2021110831A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Shillingford, Brendan</creatorcontrib><creatorcontrib>Gomes de Freitas, Joao Ferdinando</creatorcontrib><creatorcontrib>Assael, Ioannis Alexandros</creatorcontrib><title>VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION</title><description>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>HANDLING RECORD CARRIERS</subject><subject>IMAGE DATA PROCESSING OR GENERATION, IN GENERAL</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZNAP8wwOdfRRCA5wdXX2UAhydfZ39_MM8fT3U3CKVAjw8Pdz9XVVCAhydfF0BonyMLCmJeYUp_JCaW4GZTfXEGcP3dSC_PjU4oLE5NS81JL40GAjAyNDQ0MDC2NDR0Nj4lQBAKrgJvU</recordid><startdate>20210415</startdate><enddate>20210415</enddate><creator>Shillingford, Brendan</creator><creator>Gomes de Freitas, Joao Ferdinando</creator><creator>Assael, Ioannis Alexandros</creator><scope>EVB</scope></search><sort><creationdate>20210415</creationdate><title>VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION</title><author>Shillingford, Brendan ; Gomes de Freitas, Joao Ferdinando ; Assael, Ioannis Alexandros</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2021110831A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2021</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>HANDLING RECORD CARRIERS</topic><topic>IMAGE DATA PROCESSING OR GENERATION, IN GENERAL</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>Shillingford, Brendan</creatorcontrib><creatorcontrib>Gomes de Freitas, Joao Ferdinando</creatorcontrib><creatorcontrib>Assael, Ioannis Alexandros</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shillingford, Brendan</au><au>Gomes de Freitas, Joao Ferdinando</au><au>Assael, Ioannis Alexandros</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION</title><date>2021-04-15</date><risdate>2021</risdate><abstract>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epo_espacenet_US2021110831A1
source esp@cenet
subjects ACOUSTICS
CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
HANDLING RECORD CARRIERS
IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
MUSICAL INSTRUMENTS
PHYSICS
PRESENTATION OF DATA
RECOGNITION OF DATA
RECORD CARRIERS
SPEECH ANALYSIS OR SYNTHESIS
SPEECH OR AUDIO CODING OR DECODING
SPEECH OR VOICE PROCESSING
SPEECH RECOGNITION
title VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T23%3A16%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Shillingford,%20Brendan&rft.date=2021-04-15&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2021110831A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true