Residual context refinement network architecture for optical character recognition

Techniques for recognizing text in an image are described. An exemplary method may include receiving a request to recognize text in an image; extracting features from the image and generating a visual feature sequence from the extracted features; performing selective contextual refinement at least o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Litman, Roee, Wu, Jonathan, Litman, Ron, Tsiper, Shahar, Anschel, Oron, Manmatha, Raghavan, Mazor, Shai
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTING COUNTING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Litman, Roee Wu, Jonathan Litman, Ron Tsiper, Shahar Anschel, Oron Manmatha, Raghavan Mazor, Shai
description	Techniques for recognizing text in an image are described. An exemplary method may include receiving a request to recognize text in an image; extracting features from the image and generating a visual feature sequence from the extracted features; performing selective contextual refinement at least one selective contextual refinement block of a stack of selective contextual refinement blocks to generate a text prediction by: generating a contextual feature map and combining the contextual feature map with the visual feature sequence into a visual feature space, and applying a selective decoder that utilizes a two-step attention on the visual feature space to generate a text prediction, wherein the two-step attention includes performing a 1-D self-attention computation to generate attentional features and decoding the attentional features to generate the text prediction; and outputting the generated text prediction.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US11308354B1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US11308354B1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US11308354B13</originalsourceid><addsrcrecordid>eNqNjMEKwjAQBXvxIOo_xA8QDFHwXFE8Vz2XsL7YYM2GzRb9fCv4AZ7mMjPTqmlQ4m3wvSFOircaQYgJTyQ1CfpieRgv1EUF6SAwgcVw1kjfpvPiSSFjRXxPUSOneTUJvi9Y_DirlsfDZX9aIXOLkj1hHLfXs7VuvXPbTW3dP84HruI5Vg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Residual context refinement network architecture for optical character recognition</title><source>esp@cenet</source><creator>Litman, Roee ; Wu, Jonathan ; Litman, Ron ; Tsiper, Shahar ; Anschel, Oron ; Manmatha, Raghavan ; Mazor, Shai</creator><creatorcontrib>Litman, Roee ; Wu, Jonathan ; Litman, Ron ; Tsiper, Shahar ; Anschel, Oron ; Manmatha, Raghavan ; Mazor, Shai</creatorcontrib><description>Techniques for recognizing text in an image are described. An exemplary method may include receiving a request to recognize text in an image; extracting features from the image and generating a visual feature sequence from the extracted features; performing selective contextual refinement at least one selective contextual refinement block of a stack of selective contextual refinement blocks to generate a text prediction by: generating a contextual feature map and combining the contextual feature map with the visual feature sequence into a visual feature space, and applying a selective decoder that utilizes a two-step attention on the visual feature space to generate a text prediction, wherein the two-step attention includes performing a 1-D self-attention computation to generate attentional features and decoding the attentional features to generate the text prediction; and outputting the generated text prediction.</description><language>eng</language><subject>CALCULATING ; COMPUTING ; COUNTING ; HANDLING RECORD CARRIERS ; PHYSICS ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS</subject><creationdate>2022</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20220419&DB=EPODOC&CC=US&NR=11308354B1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,777,882,25545,76296</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20220419&DB=EPODOC&CC=US&NR=11308354B1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Litman, Roee</creatorcontrib><creatorcontrib>Wu, Jonathan</creatorcontrib><creatorcontrib>Litman, Ron</creatorcontrib><creatorcontrib>Tsiper, Shahar</creatorcontrib><creatorcontrib>Anschel, Oron</creatorcontrib><creatorcontrib>Manmatha, Raghavan</creatorcontrib><creatorcontrib>Mazor, Shai</creatorcontrib><title>Residual context refinement network architecture for optical character recognition</title><description>Techniques for recognizing text in an image are described. An exemplary method may include receiving a request to recognize text in an image; extracting features from the image and generating a visual feature sequence from the extracted features; performing selective contextual refinement at least one selective contextual refinement block of a stack of selective contextual refinement blocks to generate a text prediction by: generating a contextual feature map and combining the contextual feature map with the visual feature sequence into a visual feature space, and applying a selective decoder that utilizes a two-step attention on the visual feature space to generate a text prediction, wherein the two-step attention includes performing a 1-D self-attention computation to generate attentional features and decoding the attentional features to generate the text prediction; and outputting the generated text prediction.</description><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>HANDLING RECORD CARRIERS</subject><subject>PHYSICS</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2022</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNjMEKwjAQBXvxIOo_xA8QDFHwXFE8Vz2XsL7YYM2GzRb9fCv4AZ7mMjPTqmlQ4m3wvSFOircaQYgJTyQ1CfpieRgv1EUF6SAwgcVw1kjfpvPiSSFjRXxPUSOneTUJvi9Y_DirlsfDZX9aIXOLkj1hHLfXs7VuvXPbTW3dP84HruI5Vg</recordid><startdate>20220419</startdate><enddate>20220419</enddate><creator>Litman, Roee</creator><creator>Wu, Jonathan</creator><creator>Litman, Ron</creator><creator>Tsiper, Shahar</creator><creator>Anschel, Oron</creator><creator>Manmatha, Raghavan</creator><creator>Mazor, Shai</creator><scope>EVB</scope></search><sort><creationdate>20220419</creationdate><title>Residual context refinement network architecture for optical character recognition</title><author>Litman, Roee ; Wu, Jonathan ; Litman, Ron ; Tsiper, Shahar ; Anschel, Oron ; Manmatha, Raghavan ; Mazor, Shai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US11308354B13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2022</creationdate><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>HANDLING RECORD CARRIERS</topic><topic>PHYSICS</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><toplevel>online_resources</toplevel><creatorcontrib>Litman, Roee</creatorcontrib><creatorcontrib>Wu, Jonathan</creatorcontrib><creatorcontrib>Litman, Ron</creatorcontrib><creatorcontrib>Tsiper, Shahar</creatorcontrib><creatorcontrib>Anschel, Oron</creatorcontrib><creatorcontrib>Manmatha, Raghavan</creatorcontrib><creatorcontrib>Mazor, Shai</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Litman, Roee</au><au>Wu, Jonathan</au><au>Litman, Ron</au><au>Tsiper, Shahar</au><au>Anschel, Oron</au><au>Manmatha, Raghavan</au><au>Mazor, Shai</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Residual context refinement network architecture for optical character recognition</title><date>2022-04-19</date><risdate>2022</risdate><abstract>Techniques for recognizing text in an image are described. An exemplary method may include receiving a request to recognize text in an image; extracting features from the image and generating a visual feature sequence from the extracted features; performing selective contextual refinement at least one selective contextual refinement block of a stack of selective contextual refinement blocks to generate a text prediction by: generating a contextual feature map and combining the contextual feature map with the visual feature sequence into a visual feature space, and applying a selective decoder that utilizes a two-step attention on the visual feature space to generate a text prediction, wherein the two-step attention includes performing a 1-D self-attention computation to generate attentional features and decoding the attentional features to generate the text prediction; and outputting the generated text prediction.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng
recordid	cdi_epo_espacenet_US11308354B1
source	esp@cenet
subjects	CALCULATING COMPUTING COUNTING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
title	Residual context refinement network architecture for optical character recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T19%3A08%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Litman,%20Roee&rft.date=2022-04-19&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS11308354B1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true