Arbitrary Shape Natural Scene Text Detection Method Based on Soft Attention Mechanism and Dilated Convolution

Natural scene text detection has attracted much attention in the research field of computer vision, and it has been widely used in many applications, i.e., unmanned driving, robot sensing. Some methods have been proposed for horizontal and oriented text detection, but detecting irregular shapes and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020, Vol.8, p.122685-122694
Hauptverfasser: Qin, Xiao, Jiang, Jianhui, Yuan, Chang-An, Qiao, Shaojie, Fan, Wei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 122694
container_issue
container_start_page 122685
container_title IEEE access
container_volume 8
creator Qin, Xiao
Jiang, Jianhui
Yuan, Chang-An
Qiao, Shaojie
Fan, Wei
description Natural scene text detection has attracted much attention in the research field of computer vision, and it has been widely used in many applications, i.e., unmanned driving, robot sensing. Some methods have been proposed for horizontal and oriented text detection, but detecting irregular shapes and highly varying orientated text is still a challenging problem. To tackle this problem, we propose a robust arbitrary shape text detection method called Soft Dilated network (SDnet). The proposed method has two essential steps: (1) feature extraction by backbone; (2) post-processing approach to generate elaborated polygons or boundaries. In particular, the backbone is based on soft attention mechanism and dilated convolution. The soft attention mechanism learns and obtains importance feature from each feature channel, and dilated convolution can effectively aggregate multi-scale contextual information without losing the resolution, and enhance the robust of the network model. The proposed method can accurately detect curve text and discriminate text and non-text areas in an efficient fashion. In addition, Jaccard coefficient is used as loss function to promote the post-processing capability of detecting sparse-arranged and arbitrary shape text. Based on the aforementioned technique, the proposed method an effectively handle the problem of sparse arranged arbitrary natural scene text detection. Experiments were conducted on three benchmark datasets: curved text dataset CTW1500, Total-Text and oriented dataset ICDAR2015, and the results show that when compared with the state-of-the-art text detection methods, the proposed method is more robust and it can find smaller text blocks in the image due to the Loss Function calculation with Jaccard coefficient. Furthermore, we performed multiple sets of ablation experiments, verify the effectiveness of the propose method.
doi_str_mv 10.1109/ACCESS.2020.3007351
format Article
fullrecord <record><control><sourceid>proquest_webof</sourceid><recordid>TN_cdi_webofscience_primary_000553722200001</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9133383</ieee_id><doaj_id>oai_doaj_org_article_d4c608e37975483dac7aadf85d358fb7</doaj_id><sourcerecordid>2454643183</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-66686e383f8fd3702e410bb9f5ffc402bead46dbea11332b13dab383e93201fc3</originalsourceid><addsrcrecordid>eNqNkc1u1DAUhSMEElXpE3RjiSWawfaNE2c5pAUqFVikrC3HvmY8ysSD4xR4exwyKizx5vrnO8f2PUVxzeiWMdq83bXtbddtOeV0C5TWINiz4oKzqtmAgOr5P_OXxdU0HWgeMm-J-qI47mLvU9TxF-n2-oTks05z1APpDI5IHvBnIjeY0CQfRvIJ0z5Y8k5PaEled8ElsksJx_Ox2evRT0eiR0tu_KBT5towPoZhXohXxQunhwmvzvWy-Pr-9qH9uLn_8uGu3d1vTEll2lRVJSsECU46CzXlWDLa940TzmWC96htWdlcGAPgPQOr-4xjA5wyZ-CyuFt9bdAHdYr-mD-ogvbqz0aI35SOyZsBlS1NRSVC3dSilNnI1FpbJ4UFIV1fZ6_Xq9cphu8zTkkdwhzH_HzFS1FWJTAJmYKVMjFMU0T3dCujaolJrTGpJSZ1jimr3qyqH9gHNxmPo8EnZY5JCKg550tiCy3_n2590kvL2zCPKUuvV6lH_Ctpcvdy2-A3X2yvCA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2454643183</pqid></control><display><type>article</type><title>Arbitrary Shape Natural Scene Text Detection Method Based on Soft Attention Mechanism and Dilated Convolution</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Web of Science - Science Citation Index Expanded - 2020&lt;img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /&gt;</source><creator>Qin, Xiao ; Jiang, Jianhui ; Yuan, Chang-An ; Qiao, Shaojie ; Fan, Wei</creator><creatorcontrib>Qin, Xiao ; Jiang, Jianhui ; Yuan, Chang-An ; Qiao, Shaojie ; Fan, Wei</creatorcontrib><description>Natural scene text detection has attracted much attention in the research field of computer vision, and it has been widely used in many applications, i.e., unmanned driving, robot sensing. Some methods have been proposed for horizontal and oriented text detection, but detecting irregular shapes and highly varying orientated text is still a challenging problem. To tackle this problem, we propose a robust arbitrary shape text detection method called Soft Dilated network (SDnet). The proposed method has two essential steps: (1) feature extraction by backbone; (2) post-processing approach to generate elaborated polygons or boundaries. In particular, the backbone is based on soft attention mechanism and dilated convolution. The soft attention mechanism learns and obtains importance feature from each feature channel, and dilated convolution can effectively aggregate multi-scale contextual information without losing the resolution, and enhance the robust of the network model. The proposed method can accurately detect curve text and discriminate text and non-text areas in an efficient fashion. In addition, Jaccard coefficient is used as loss function to promote the post-processing capability of detecting sparse-arranged and arbitrary shape text. Based on the aforementioned technique, the proposed method an effectively handle the problem of sparse arranged arbitrary natural scene text detection. Experiments were conducted on three benchmark datasets: curved text dataset CTW1500, Total-Text and oriented dataset ICDAR2015, and the results show that when compared with the state-of-the-art text detection methods, the proposed method is more robust and it can find smaller text blocks in the image due to the Loss Function calculation with Jaccard coefficient. Furthermore, we performed multiple sets of ablation experiments, verify the effectiveness of the propose method.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2020.3007351</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>PISCATAWAY: IEEE</publisher><subject>Ablation ; Aggregates ; Backbone ; Computer Science ; Computer Science, Information Systems ; Computer vision ; Convolution ; Datasets ; deep learning ; dilated convolutions ; Engineering ; Engineering, Electrical &amp; Electronic ; Feature extraction ; Horizontal orientation ; Image segmentation ; Jaccard coefficient ; Post-production processing ; Residual neural networks ; Robot sensors ; Robustness ; Science &amp; Technology ; Semantics ; Shape ; Shape recognition ; soft attention mechanism ; Technology ; Telecommunications ; Text detection</subject><ispartof>IEEE access, 2020, Vol.8, p.122685-122694</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>11</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000553722200001</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c408t-66686e383f8fd3702e410bb9f5ffc402bead46dbea11332b13dab383e93201fc3</citedby><cites>FETCH-LOGICAL-c408t-66686e383f8fd3702e410bb9f5ffc402bead46dbea11332b13dab383e93201fc3</cites><orcidid>0000-0003-3237-7083 ; 0000-0002-4703-780X ; 0000-0003-4317-0299</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9133383$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>315,782,786,866,2104,2116,4026,27640,27930,27931,27932,28255,54940</link.rule.ids></links><search><creatorcontrib>Qin, Xiao</creatorcontrib><creatorcontrib>Jiang, Jianhui</creatorcontrib><creatorcontrib>Yuan, Chang-An</creatorcontrib><creatorcontrib>Qiao, Shaojie</creatorcontrib><creatorcontrib>Fan, Wei</creatorcontrib><title>Arbitrary Shape Natural Scene Text Detection Method Based on Soft Attention Mechanism and Dilated Convolution</title><title>IEEE access</title><addtitle>Access</addtitle><addtitle>IEEE ACCESS</addtitle><description>Natural scene text detection has attracted much attention in the research field of computer vision, and it has been widely used in many applications, i.e., unmanned driving, robot sensing. Some methods have been proposed for horizontal and oriented text detection, but detecting irregular shapes and highly varying orientated text is still a challenging problem. To tackle this problem, we propose a robust arbitrary shape text detection method called Soft Dilated network (SDnet). The proposed method has two essential steps: (1) feature extraction by backbone; (2) post-processing approach to generate elaborated polygons or boundaries. In particular, the backbone is based on soft attention mechanism and dilated convolution. The soft attention mechanism learns and obtains importance feature from each feature channel, and dilated convolution can effectively aggregate multi-scale contextual information without losing the resolution, and enhance the robust of the network model. The proposed method can accurately detect curve text and discriminate text and non-text areas in an efficient fashion. In addition, Jaccard coefficient is used as loss function to promote the post-processing capability of detecting sparse-arranged and arbitrary shape text. Based on the aforementioned technique, the proposed method an effectively handle the problem of sparse arranged arbitrary natural scene text detection. Experiments were conducted on three benchmark datasets: curved text dataset CTW1500, Total-Text and oriented dataset ICDAR2015, and the results show that when compared with the state-of-the-art text detection methods, the proposed method is more robust and it can find smaller text blocks in the image due to the Loss Function calculation with Jaccard coefficient. Furthermore, we performed multiple sets of ablation experiments, verify the effectiveness of the propose method.</description><subject>Ablation</subject><subject>Aggregates</subject><subject>Backbone</subject><subject>Computer Science</subject><subject>Computer Science, Information Systems</subject><subject>Computer vision</subject><subject>Convolution</subject><subject>Datasets</subject><subject>deep learning</subject><subject>dilated convolutions</subject><subject>Engineering</subject><subject>Engineering, Electrical &amp; Electronic</subject><subject>Feature extraction</subject><subject>Horizontal orientation</subject><subject>Image segmentation</subject><subject>Jaccard coefficient</subject><subject>Post-production processing</subject><subject>Residual neural networks</subject><subject>Robot sensors</subject><subject>Robustness</subject><subject>Science &amp; Technology</subject><subject>Semantics</subject><subject>Shape</subject><subject>Shape recognition</subject><subject>soft attention mechanism</subject><subject>Technology</subject><subject>Telecommunications</subject><subject>Text detection</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>AOWDO</sourceid><sourceid>DOA</sourceid><recordid>eNqNkc1u1DAUhSMEElXpE3RjiSWawfaNE2c5pAUqFVikrC3HvmY8ysSD4xR4exwyKizx5vrnO8f2PUVxzeiWMdq83bXtbddtOeV0C5TWINiz4oKzqtmAgOr5P_OXxdU0HWgeMm-J-qI47mLvU9TxF-n2-oTks05z1APpDI5IHvBnIjeY0CQfRvIJ0z5Y8k5PaEled8ElsksJx_Ox2evRT0eiR0tu_KBT5towPoZhXohXxQunhwmvzvWy-Pr-9qH9uLn_8uGu3d1vTEll2lRVJSsECU46CzXlWDLa940TzmWC96htWdlcGAPgPQOr-4xjA5wyZ-CyuFt9bdAHdYr-mD-ogvbqz0aI35SOyZsBlS1NRSVC3dSilNnI1FpbJ4UFIV1fZ6_Xq9cphu8zTkkdwhzH_HzFS1FWJTAJmYKVMjFMU0T3dCujaolJrTGpJSZ1jimr3qyqH9gHNxmPo8EnZY5JCKg550tiCy3_n2590kvL2zCPKUuvV6lH_Ctpcvdy2-A3X2yvCA</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Qin, Xiao</creator><creator>Jiang, Jianhui</creator><creator>Yuan, Chang-An</creator><creator>Qiao, Shaojie</creator><creator>Fan, Wei</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AOWDO</scope><scope>BLEPL</scope><scope>DTL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-3237-7083</orcidid><orcidid>https://orcid.org/0000-0002-4703-780X</orcidid><orcidid>https://orcid.org/0000-0003-4317-0299</orcidid></search><sort><creationdate>2020</creationdate><title>Arbitrary Shape Natural Scene Text Detection Method Based on Soft Attention Mechanism and Dilated Convolution</title><author>Qin, Xiao ; Jiang, Jianhui ; Yuan, Chang-An ; Qiao, Shaojie ; Fan, Wei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-66686e383f8fd3702e410bb9f5ffc402bead46dbea11332b13dab383e93201fc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Ablation</topic><topic>Aggregates</topic><topic>Backbone</topic><topic>Computer Science</topic><topic>Computer Science, Information Systems</topic><topic>Computer vision</topic><topic>Convolution</topic><topic>Datasets</topic><topic>deep learning</topic><topic>dilated convolutions</topic><topic>Engineering</topic><topic>Engineering, Electrical &amp; Electronic</topic><topic>Feature extraction</topic><topic>Horizontal orientation</topic><topic>Image segmentation</topic><topic>Jaccard coefficient</topic><topic>Post-production processing</topic><topic>Residual neural networks</topic><topic>Robot sensors</topic><topic>Robustness</topic><topic>Science &amp; Technology</topic><topic>Semantics</topic><topic>Shape</topic><topic>Shape recognition</topic><topic>soft attention mechanism</topic><topic>Technology</topic><topic>Telecommunications</topic><topic>Text detection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Qin, Xiao</creatorcontrib><creatorcontrib>Jiang, Jianhui</creatorcontrib><creatorcontrib>Yuan, Chang-An</creatorcontrib><creatorcontrib>Qiao, Shaojie</creatorcontrib><creatorcontrib>Fan, Wei</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Web of Science - Science Citation Index Expanded - 2020</collection><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Qin, Xiao</au><au>Jiang, Jianhui</au><au>Yuan, Chang-An</au><au>Qiao, Shaojie</au><au>Fan, Wei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Arbitrary Shape Natural Scene Text Detection Method Based on Soft Attention Mechanism and Dilated Convolution</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><stitle>IEEE ACCESS</stitle><date>2020</date><risdate>2020</risdate><volume>8</volume><spage>122685</spage><epage>122694</epage><pages>122685-122694</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Natural scene text detection has attracted much attention in the research field of computer vision, and it has been widely used in many applications, i.e., unmanned driving, robot sensing. Some methods have been proposed for horizontal and oriented text detection, but detecting irregular shapes and highly varying orientated text is still a challenging problem. To tackle this problem, we propose a robust arbitrary shape text detection method called Soft Dilated network (SDnet). The proposed method has two essential steps: (1) feature extraction by backbone; (2) post-processing approach to generate elaborated polygons or boundaries. In particular, the backbone is based on soft attention mechanism and dilated convolution. The soft attention mechanism learns and obtains importance feature from each feature channel, and dilated convolution can effectively aggregate multi-scale contextual information without losing the resolution, and enhance the robust of the network model. The proposed method can accurately detect curve text and discriminate text and non-text areas in an efficient fashion. In addition, Jaccard coefficient is used as loss function to promote the post-processing capability of detecting sparse-arranged and arbitrary shape text. Based on the aforementioned technique, the proposed method an effectively handle the problem of sparse arranged arbitrary natural scene text detection. Experiments were conducted on three benchmark datasets: curved text dataset CTW1500, Total-Text and oriented dataset ICDAR2015, and the results show that when compared with the state-of-the-art text detection methods, the proposed method is more robust and it can find smaller text blocks in the image due to the Loss Function calculation with Jaccard coefficient. Furthermore, we performed multiple sets of ablation experiments, verify the effectiveness of the propose method.</abstract><cop>PISCATAWAY</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2020.3007351</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-3237-7083</orcidid><orcidid>https://orcid.org/0000-0002-4703-780X</orcidid><orcidid>https://orcid.org/0000-0003-4317-0299</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2020, Vol.8, p.122685-122694
issn 2169-3536
2169-3536
language eng
recordid cdi_webofscience_primary_000553722200001
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Web of Science - Science Citation Index Expanded - 2020<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />
subjects Ablation
Aggregates
Backbone
Computer Science
Computer Science, Information Systems
Computer vision
Convolution
Datasets
deep learning
dilated convolutions
Engineering
Engineering, Electrical & Electronic
Feature extraction
Horizontal orientation
Image segmentation
Jaccard coefficient
Post-production processing
Residual neural networks
Robot sensors
Robustness
Science & Technology
Semantics
Shape
Shape recognition
soft attention mechanism
Technology
Telecommunications
Text detection
title Arbitrary Shape Natural Scene Text Detection Method Based on Soft Attention Mechanism and Dilated Convolution
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-05T16%3A07%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_webof&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Arbitrary%20Shape%20Natural%20Scene%20Text%20Detection%20Method%20Based%20on%20Soft%20Attention%20Mechanism%20and%20Dilated%20Convolution&rft.jtitle=IEEE%20access&rft.au=Qin,%20Xiao&rft.date=2020&rft.volume=8&rft.spage=122685&rft.epage=122694&rft.pages=122685-122694&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2020.3007351&rft_dat=%3Cproquest_webof%3E2454643183%3C/proquest_webof%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2454643183&rft_id=info:pmid/&rft_ieee_id=9133383&rft_doaj_id=oai_doaj_org_article_d4c608e37975483dac7aadf85d358fb7&rfr_iscdi=true