Robust Scene Text Recognition with Automatic Rectification

Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with A...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2016-04
Hauptverfasser: Shi, Baoguang, Wang, Xinggang, Lyu, Pengyuan, Yao, Cong, Bai, Xiang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Shi, Baoguang
Wang, Xinggang
Lyu, Pengyuan
Yao, Cong
Bai, Xiang
description Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a specially-designed deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. We show that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2078110125</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2078110125</sourcerecordid><originalsourceid>FETCH-proquest_journals_20781101253</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwCspPKi0uUQhOTs1LVQhJrShRCEpNzk_PyyzJzM9TKM8syVBwLC3Jz00syUwGSZVkpmUmJ4IkeRhY0xJzilN5oTQ3g7Kba4izh25BUX5haWpxSXxWfmlRHlAq3sjA3MLQ0MDQyNSYOFUAz9U27w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2078110125</pqid></control><display><type>article</type><title>Robust Scene Text Recognition with Automatic Rectification</title><source>Free E- Journals</source><creator>Shi, Baoguang ; Wang, Xinggang ; Lyu, Pengyuan ; Yao, Cong ; Bai, Xiang</creator><creatorcontrib>Shi, Baoguang ; Wang, Xinggang ; Lyu, Pengyuan ; Yao, Cong ; Bai, Xiang</creatorcontrib><description>Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a specially-designed deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. We show that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Neural networks ; Object recognition ; Robustness</subject><ispartof>arXiv.org, 2016-04</ispartof><rights>2016. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Shi, Baoguang</creatorcontrib><creatorcontrib>Wang, Xinggang</creatorcontrib><creatorcontrib>Lyu, Pengyuan</creatorcontrib><creatorcontrib>Yao, Cong</creatorcontrib><creatorcontrib>Bai, Xiang</creatorcontrib><title>Robust Scene Text Recognition with Automatic Rectification</title><title>arXiv.org</title><description>Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a specially-designed deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. We show that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model.</description><subject>Neural networks</subject><subject>Object recognition</subject><subject>Robustness</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwCspPKi0uUQhOTs1LVQhJrShRCEpNzk_PyyzJzM9TKM8syVBwLC3Jz00syUwGSZVkpmUmJ4IkeRhY0xJzilN5oTQ3g7Kba4izh25BUX5haWpxSXxWfmlRHlAq3sjA3MLQ0MDQyNSYOFUAz9U27w</recordid><startdate>20160419</startdate><enddate>20160419</enddate><creator>Shi, Baoguang</creator><creator>Wang, Xinggang</creator><creator>Lyu, Pengyuan</creator><creator>Yao, Cong</creator><creator>Bai, Xiang</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20160419</creationdate><title>Robust Scene Text Recognition with Automatic Rectification</title><author>Shi, Baoguang ; Wang, Xinggang ; Lyu, Pengyuan ; Yao, Cong ; Bai, Xiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20781101253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Neural networks</topic><topic>Object recognition</topic><topic>Robustness</topic><toplevel>online_resources</toplevel><creatorcontrib>Shi, Baoguang</creatorcontrib><creatorcontrib>Wang, Xinggang</creatorcontrib><creatorcontrib>Lyu, Pengyuan</creatorcontrib><creatorcontrib>Yao, Cong</creatorcontrib><creatorcontrib>Bai, Xiang</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shi, Baoguang</au><au>Wang, Xinggang</au><au>Lyu, Pengyuan</au><au>Yao, Cong</au><au>Bai, Xiang</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Robust Scene Text Recognition with Automatic Rectification</atitle><jtitle>arXiv.org</jtitle><date>2016-04-19</date><risdate>2016</risdate><eissn>2331-8422</eissn><abstract>Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a specially-designed deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. We show that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2016-04
issn 2331-8422
language eng
recordid cdi_proquest_journals_2078110125
source Free E- Journals
subjects Neural networks
Object recognition
Robustness
title Robust Scene Text Recognition with Automatic Rectification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T06%3A45%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Robust%20Scene%20Text%20Recognition%20with%20Automatic%20Rectification&rft.jtitle=arXiv.org&rft.au=Shi,%20Baoguang&rft.date=2016-04-19&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2078110125%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2078110125&rft_id=info:pmid/&rfr_iscdi=true