Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection an...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2021-02, Vol.43 (2), p.532-548
Hauptverfasser:	Liao, Minghui, Lyu, Pengyuan, He, Minghang, Yao, Cong, Wu, Wenhao, Bai, Xiang
Format:	Artikel
Sprache:	eng
Schlagworte:	arbitrary shapes attention Datasets Detectors Image segmentation Modules Neural networks Object recognition Proposals scene text detection scene text recognition Scene text spotting segmentation Shape Shape recognition Task analysis Text recognition Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	548
container_issue	2
container_start_page	532
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	43
creator	Liao, Minghui Lyu, Pengyuan He, Minghang Yao, Cong Wu, Wenhao Bai, Xiang
description	Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network named as Mask TextSpotter is presented. Different from the previous text spotters that follow the pipeline consisting of a proposal generation network and a sequence-to-sequence recognition network, Mask TextSpotter enjoys a simple and smooth end-to-end learning procedure, in which both detection and recognition can be achieved directly from two-dimensional space via semantic segmentation. Further, a spatial attention module is proposed to enhance the performance and universality. Benefiting from the proposed two-dimensional representation on both detection and recognition, it easily handles text instances of irregular shapes, for instance, curved text. We evaluate it on four English datasets and one multi-language dataset, achieving consistently superior performance over state-of-the-art methods in both detection and end-to-end text recognition tasks. Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.
doi_str_mv	10.1109/TPAMI.2019.2937086
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPAMI_2019_2937086</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8812908</ieee_id><sourcerecordid>2280525703</sourcerecordid><originalsourceid>FETCH-LOGICAL-c395t-a707e6c0efa2d10be7bdadc835a9a7fdb984a1b900d12148f61f0dd150200d0f3</originalsourceid><addsrcrecordid>eNpdkEFP3DAQha2qFSyUP0ClylIvXLKdsZON3dtqRSkSFCS2Z9eJJxDIJovtCPj3mN0th56eNPO9p5nH2DHCFBH09-X1_PJ8KgD1VGhZgpp9YBPUUmeykPojmwDORKaUUPvsIIR7AMwLkHtsX2Kea4Biwv5e2vDAl_Qcb9ZDjOR_8HnPT3uXxSFLwpfetr2tOuK_afS2SxKfBv_Am8HzjaftbzcB_KmNd3zuqzZ661_4zZ1dU_jMPjW2C3S000P25-fpcvEru7g6O1_ML7Ja6iJmtoSSZjVQY4VDqKisnHW1koXVtmxcpVVusUpHOxSYq2aGDTiHBYg0gkYespNt7toPjyOFaFZtqKnrbE_DGIwQCgpRlCAT-u0_9H4YfZ-uMyIvFUpd5jpRYkvVfgjBU2PWvl2lxwyCeevfbPo3b_2bXf_J9HUXPVYrcu-Wf4Un4MsWaInofa0UCg1KvgITG4kR</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2478139749</pqid></control><display><type>article</type><title>Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes</title><source>IEEE Electronic Library (IEL)</source><creator>Liao, Minghui ; Lyu, Pengyuan ; He, Minghang ; Yao, Cong ; Wu, Wenhao ; Bai, Xiang</creator><creatorcontrib>Liao, Minghui ; Lyu, Pengyuan ; He, Minghang ; Yao, Cong ; Wu, Wenhao ; Bai, Xiang</creatorcontrib><description>Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network named as Mask TextSpotter is presented. Different from the previous text spotters that follow the pipeline consisting of a proposal generation network and a sequence-to-sequence recognition network, Mask TextSpotter enjoys a simple and smooth end-to-end learning procedure, in which both detection and recognition can be achieved directly from two-dimensional space via semantic segmentation. Further, a spatial attention module is proposed to enhance the performance and universality. Benefiting from the proposed two-dimensional representation on both detection and recognition, it easily handles text instances of irregular shapes, for instance, curved text. We evaluate it on four English datasets and one multi-language dataset, achieving consistently superior performance over state-of-the-art methods in both detection and end-to-end text recognition tasks. Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2019.2937086</identifier><identifier>PMID: 31449005</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>arbitrary shapes ; attention ; Datasets ; Detectors ; Image segmentation ; Modules ; Neural networks ; Object recognition ; Proposals ; scene text detection ; scene text recognition ; Scene text spotting ; segmentation ; Shape ; Shape recognition ; Task analysis ; Text recognition ; Training</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2021-02, Vol.43 (2), p.532-548</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c395t-a707e6c0efa2d10be7bdadc835a9a7fdb984a1b900d12148f61f0dd150200d0f3</citedby><cites>FETCH-LOGICAL-c395t-a707e6c0efa2d10be7bdadc835a9a7fdb984a1b900d12148f61f0dd150200d0f3</cites><orcidid>0000-0001-6564-4796 ; 0000-0003-3153-8519 ; 0000-0002-3449-5940 ; 0000-0002-2583-4314</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8812908$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8812908$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31449005$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Liao, Minghui</creatorcontrib><creatorcontrib>Lyu, Pengyuan</creatorcontrib><creatorcontrib>He, Minghang</creatorcontrib><creatorcontrib>Yao, Cong</creatorcontrib><creatorcontrib>Wu, Wenhao</creatorcontrib><creatorcontrib>Bai, Xiang</creatorcontrib><title>Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network named as Mask TextSpotter is presented. Different from the previous text spotters that follow the pipeline consisting of a proposal generation network and a sequence-to-sequence recognition network, Mask TextSpotter enjoys a simple and smooth end-to-end learning procedure, in which both detection and recognition can be achieved directly from two-dimensional space via semantic segmentation. Further, a spatial attention module is proposed to enhance the performance and universality. Benefiting from the proposed two-dimensional representation on both detection and recognition, it easily handles text instances of irregular shapes, for instance, curved text. We evaluate it on four English datasets and one multi-language dataset, achieving consistently superior performance over state-of-the-art methods in both detection and end-to-end text recognition tasks. Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.</description><subject>arbitrary shapes</subject><subject>attention</subject><subject>Datasets</subject><subject>Detectors</subject><subject>Image segmentation</subject><subject>Modules</subject><subject>Neural networks</subject><subject>Object recognition</subject><subject>Proposals</subject><subject>scene text detection</subject><subject>scene text recognition</subject><subject>Scene text spotting</subject><subject>segmentation</subject><subject>Shape</subject><subject>Shape recognition</subject><subject>Task analysis</subject><subject>Text recognition</subject><subject>Training</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkEFP3DAQha2qFSyUP0ClylIvXLKdsZON3dtqRSkSFCS2Z9eJJxDIJovtCPj3mN0th56eNPO9p5nH2DHCFBH09-X1_PJ8KgD1VGhZgpp9YBPUUmeykPojmwDORKaUUPvsIIR7AMwLkHtsX2Kea4Biwv5e2vDAl_Qcb9ZDjOR_8HnPT3uXxSFLwpfetr2tOuK_afS2SxKfBv_Am8HzjaftbzcB_KmNd3zuqzZ661_4zZ1dU_jMPjW2C3S000P25-fpcvEru7g6O1_ML7Ja6iJmtoSSZjVQY4VDqKisnHW1koXVtmxcpVVusUpHOxSYq2aGDTiHBYg0gkYespNt7toPjyOFaFZtqKnrbE_DGIwQCgpRlCAT-u0_9H4YfZ-uMyIvFUpd5jpRYkvVfgjBU2PWvl2lxwyCeevfbPo3b_2bXf_J9HUXPVYrcu-Wf4Un4MsWaInofa0UCg1KvgITG4kR</recordid><startdate>20210201</startdate><enddate>20210201</enddate><creator>Liao, Minghui</creator><creator>Lyu, Pengyuan</creator><creator>He, Minghang</creator><creator>Yao, Cong</creator><creator>Wu, Wenhao</creator><creator>Bai, Xiang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-6564-4796</orcidid><orcidid>https://orcid.org/0000-0003-3153-8519</orcidid><orcidid>https://orcid.org/0000-0002-3449-5940</orcidid><orcidid>https://orcid.org/0000-0002-2583-4314</orcidid></search><sort><creationdate>20210201</creationdate><title>Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes</title><author>Liao, Minghui ; Lyu, Pengyuan ; He, Minghang ; Yao, Cong ; Wu, Wenhao ; Bai, Xiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c395t-a707e6c0efa2d10be7bdadc835a9a7fdb984a1b900d12148f61f0dd150200d0f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>arbitrary shapes</topic><topic>attention</topic><topic>Datasets</topic><topic>Detectors</topic><topic>Image segmentation</topic><topic>Modules</topic><topic>Neural networks</topic><topic>Object recognition</topic><topic>Proposals</topic><topic>scene text detection</topic><topic>scene text recognition</topic><topic>Scene text spotting</topic><topic>segmentation</topic><topic>Shape</topic><topic>Shape recognition</topic><topic>Task analysis</topic><topic>Text recognition</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liao, Minghui</creatorcontrib><creatorcontrib>Lyu, Pengyuan</creatorcontrib><creatorcontrib>He, Minghang</creatorcontrib><creatorcontrib>Yao, Cong</creatorcontrib><creatorcontrib>Wu, Wenhao</creatorcontrib><creatorcontrib>Bai, Xiang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liao, Minghui</au><au>Lyu, Pengyuan</au><au>He, Minghang</au><au>Yao, Cong</au><au>Wu, Wenhao</au><au>Bai, Xiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2021-02-01</date><risdate>2021</risdate><volume>43</volume><issue>2</issue><spage>532</spage><epage>548</epage><pages>532-548</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network named as Mask TextSpotter is presented. Different from the previous text spotters that follow the pipeline consisting of a proposal generation network and a sequence-to-sequence recognition network, Mask TextSpotter enjoys a simple and smooth end-to-end learning procedure, in which both detection and recognition can be achieved directly from two-dimensional space via semantic segmentation. Further, a spatial attention module is proposed to enhance the performance and universality. Benefiting from the proposed two-dimensional representation on both detection and recognition, it easily handles text instances of irregular shapes, for instance, curved text. We evaluate it on four English datasets and one multi-language dataset, achieving consistently superior performance over state-of-the-art methods in both detection and end-to-end text recognition tasks. Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>31449005</pmid><doi>10.1109/TPAMI.2019.2937086</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0001-6564-4796</orcidid><orcidid>https://orcid.org/0000-0003-3153-8519</orcidid><orcidid>https://orcid.org/0000-0002-3449-5940</orcidid><orcidid>https://orcid.org/0000-0002-2583-4314</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2021-02, Vol.43 (2), p.532-548
issn	0162-8828 1939-3539 2160-9292
language	eng
recordid	cdi_crossref_primary_10_1109_TPAMI_2019_2937086
source	IEEE Electronic Library (IEL)
subjects	arbitrary shapes attention Datasets Detectors Image segmentation Modules Neural networks Object recognition Proposals scene text detection scene text recognition Scene text spotting segmentation Shape Shape recognition Task analysis Text recognition Training
title	Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T05%3A33%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mask%20TextSpotter:%20An%20End-to-End%20Trainable%20Neural%20Network%20for%20Spotting%20Text%20with%20Arbitrary%20Shapes&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Liao,%20Minghui&rft.date=2021-02-01&rft.volume=43&rft.issue=2&rft.spage=532&rft.epage=548&rft.pages=532-548&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2019.2937086&rft_dat=%3Cproquest_RIE%3E2280525703%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2478139749&rft_id=info:pmid/31449005&rft_ieee_id=8812908&rfr_iscdi=true