Text Enhancement Network for Cross-domain Scene Text Detection

Conventional scene text detection approaches essentially assume that training and test data are drawn from the same distribution and have achieved compelling results. However, scene text detectors often suffer from performance degradation in real-world applications, since the feature distribution of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE signal processing letters 2022, Vol.29, p.1-5
Hauptverfasser:	Deng, Jinhong, Luo, Xiulian, Zheng, Jiawen, Dang, Wanli, Li, Wen
Format:	Artikel
Sprache:	eng
Schlagworte:	Background noise Convolution Detectors Domain adaptation Domains Feature extraction Geometry Modules Object detection Performance degradation Scene text detection Semantics Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	5
container_issue
container_start_page	1
container_title	IEEE signal processing letters
container_volume	29
creator	Deng, Jinhong Luo, Xiulian Zheng, Jiawen Dang, Wanli Li, Wen
description	Conventional scene text detection approaches essentially assume that training and test data are drawn from the same distribution and have achieved compelling results. However, scene text detectors often suffer from performance degradation in real-world applications, since the feature distribution of training images is different from that of test images obtained from a new scene. To address the above problems, we propose a novel method called Text Enhancement Network (TEN) based on adversarial learning for cross-domain scene text detection. Specifically, we first design a Multi-adversarial Feature Alignment (MFA) module to maximally align features of the source and target data from low-level texture to high-level semantics. Second, we develop the Text Attention Enhancement (TAE) module to re-weigh the importance of text regions and accordingly enhance the corresponding features, in order to improve the robustness against noisy background. Additionally, we design a self-training strategy to further boost the performance of our TEN. We conduct extensive experiments on five benchmarks, and the experimental results demonstrate the effectiveness of our TEN.
doi_str_mv	10.1109/LSP.2022.3214155
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2731856918</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9917319</ieee_id><sourcerecordid>2731856918</sourcerecordid><originalsourceid>FETCH-LOGICAL-c357t-5f589626c860b0223492f0cbde4306f35c42e30263481e07680a8e3f5543a46b3</originalsourceid><addsrcrecordid>eNo9kM1LAzEQxYMoWKt3wcuC562TZJNNLoLU-gGLCq3nsE0nuNUmNZui_vemtniZmcN7M29-hJxTGFEK-qqZvowYMDbijFZUiAMyyFWVjEt6mGeoodQa1DE56fslACiqxIBcz_A7FRP_1nqLK_SpeML0FeJ74UIsxjH0fbkIq7bzxdSix-JPf4sJbeqCPyVHrv3o8Wzfh-T1bjIbP5TN8_3j-KYpLRd1KoUTSksmrZIwzyF5pZkDO19gxUE6LmzFkAOTvFIUoZYKWoXcCVHxtpJzPiSXu73rGD432CezDJvo80nDap4_kZqqrIKdym5zR3RmHbtVG38MBbOlZDIls6Vk9pSy5WJn6RDxX641zVs1_wW2fmCe</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2731856918</pqid></control><display><type>article</type><title>Text Enhancement Network for Cross-domain Scene Text Detection</title><source>IEEE Electronic Library (IEL)</source><creator>Deng, Jinhong ; Luo, Xiulian ; Zheng, Jiawen ; Dang, Wanli ; Li, Wen</creator><creatorcontrib>Deng, Jinhong ; Luo, Xiulian ; Zheng, Jiawen ; Dang, Wanli ; Li, Wen</creatorcontrib><description>Conventional scene text detection approaches essentially assume that training and test data are drawn from the same distribution and have achieved compelling results. However, scene text detectors often suffer from performance degradation in real-world applications, since the feature distribution of training images is different from that of test images obtained from a new scene. To address the above problems, we propose a novel method called Text Enhancement Network (TEN) based on adversarial learning for cross-domain scene text detection. Specifically, we first design a Multi-adversarial Feature Alignment (MFA) module to maximally align features of the source and target data from low-level texture to high-level semantics. Second, we develop the Text Attention Enhancement (TAE) module to re-weigh the importance of text regions and accordingly enhance the corresponding features, in order to improve the robustness against noisy background. Additionally, we design a self-training strategy to further boost the performance of our TEN. We conduct extensive experiments on five benchmarks, and the experimental results demonstrate the effectiveness of our TEN.</description><identifier>ISSN: 1070-9908</identifier><identifier>EISSN: 1558-2361</identifier><identifier>DOI: 10.1109/LSP.2022.3214155</identifier><identifier>CODEN: ISPLEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Background noise ; Convolution ; Detectors ; Domain adaptation ; Domains ; Feature extraction ; Geometry ; Modules ; Object detection ; Performance degradation ; Scene text detection ; Semantics ; Training</subject><ispartof>IEEE signal processing letters, 2022, Vol.29, p.1-5</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c357t-5f589626c860b0223492f0cbde4306f35c42e30263481e07680a8e3f5543a46b3</citedby><cites>FETCH-LOGICAL-c357t-5f589626c860b0223492f0cbde4306f35c42e30263481e07680a8e3f5543a46b3</cites><orcidid>0000-0002-5559-8594 ; 0000-0002-9108-737X ; 0000-0003-0939-0669</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9917319$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,4010,27900,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9917319$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Deng, Jinhong</creatorcontrib><creatorcontrib>Luo, Xiulian</creatorcontrib><creatorcontrib>Zheng, Jiawen</creatorcontrib><creatorcontrib>Dang, Wanli</creatorcontrib><creatorcontrib>Li, Wen</creatorcontrib><title>Text Enhancement Network for Cross-domain Scene Text Detection</title><title>IEEE signal processing letters</title><addtitle>LSP</addtitle><description>Conventional scene text detection approaches essentially assume that training and test data are drawn from the same distribution and have achieved compelling results. However, scene text detectors often suffer from performance degradation in real-world applications, since the feature distribution of training images is different from that of test images obtained from a new scene. To address the above problems, we propose a novel method called Text Enhancement Network (TEN) based on adversarial learning for cross-domain scene text detection. Specifically, we first design a Multi-adversarial Feature Alignment (MFA) module to maximally align features of the source and target data from low-level texture to high-level semantics. Second, we develop the Text Attention Enhancement (TAE) module to re-weigh the importance of text regions and accordingly enhance the corresponding features, in order to improve the robustness against noisy background. Additionally, we design a self-training strategy to further boost the performance of our TEN. We conduct extensive experiments on five benchmarks, and the experimental results demonstrate the effectiveness of our TEN.</description><subject>Background noise</subject><subject>Convolution</subject><subject>Detectors</subject><subject>Domain adaptation</subject><subject>Domains</subject><subject>Feature extraction</subject><subject>Geometry</subject><subject>Modules</subject><subject>Object detection</subject><subject>Performance degradation</subject><subject>Scene text detection</subject><subject>Semantics</subject><subject>Training</subject><issn>1070-9908</issn><issn>1558-2361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kM1LAzEQxYMoWKt3wcuC562TZJNNLoLU-gGLCq3nsE0nuNUmNZui_vemtniZmcN7M29-hJxTGFEK-qqZvowYMDbijFZUiAMyyFWVjEt6mGeoodQa1DE56fslACiqxIBcz_A7FRP_1nqLK_SpeML0FeJ74UIsxjH0fbkIq7bzxdSix-JPf4sJbeqCPyVHrv3o8Wzfh-T1bjIbP5TN8_3j-KYpLRd1KoUTSksmrZIwzyF5pZkDO19gxUE6LmzFkAOTvFIUoZYKWoXcCVHxtpJzPiSXu73rGD432CezDJvo80nDap4_kZqqrIKdym5zR3RmHbtVG38MBbOlZDIls6Vk9pSy5WJn6RDxX641zVs1_wW2fmCe</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Deng, Jinhong</creator><creator>Luo, Xiulian</creator><creator>Zheng, Jiawen</creator><creator>Dang, Wanli</creator><creator>Li, Wen</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5559-8594</orcidid><orcidid>https://orcid.org/0000-0002-9108-737X</orcidid><orcidid>https://orcid.org/0000-0003-0939-0669</orcidid></search><sort><creationdate>2022</creationdate><title>Text Enhancement Network for Cross-domain Scene Text Detection</title><author>Deng, Jinhong ; Luo, Xiulian ; Zheng, Jiawen ; Dang, Wanli ; Li, Wen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c357t-5f589626c860b0223492f0cbde4306f35c42e30263481e07680a8e3f5543a46b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Background noise</topic><topic>Convolution</topic><topic>Detectors</topic><topic>Domain adaptation</topic><topic>Domains</topic><topic>Feature extraction</topic><topic>Geometry</topic><topic>Modules</topic><topic>Object detection</topic><topic>Performance degradation</topic><topic>Scene text detection</topic><topic>Semantics</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Deng, Jinhong</creatorcontrib><creatorcontrib>Luo, Xiulian</creatorcontrib><creatorcontrib>Zheng, Jiawen</creatorcontrib><creatorcontrib>Dang, Wanli</creatorcontrib><creatorcontrib>Li, Wen</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE signal processing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Deng, Jinhong</au><au>Luo, Xiulian</au><au>Zheng, Jiawen</au><au>Dang, Wanli</au><au>Li, Wen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Text Enhancement Network for Cross-domain Scene Text Detection</atitle><jtitle>IEEE signal processing letters</jtitle><stitle>LSP</stitle><date>2022</date><risdate>2022</risdate><volume>29</volume><spage>1</spage><epage>5</epage><pages>1-5</pages><issn>1070-9908</issn><eissn>1558-2361</eissn><coden>ISPLEM</coden><abstract>Conventional scene text detection approaches essentially assume that training and test data are drawn from the same distribution and have achieved compelling results. However, scene text detectors often suffer from performance degradation in real-world applications, since the feature distribution of training images is different from that of test images obtained from a new scene. To address the above problems, we propose a novel method called Text Enhancement Network (TEN) based on adversarial learning for cross-domain scene text detection. Specifically, we first design a Multi-adversarial Feature Alignment (MFA) module to maximally align features of the source and target data from low-level texture to high-level semantics. Second, we develop the Text Attention Enhancement (TAE) module to re-weigh the importance of text regions and accordingly enhance the corresponding features, in order to improve the robustness against noisy background. Additionally, we design a self-training strategy to further boost the performance of our TEN. We conduct extensive experiments on five benchmarks, and the experimental results demonstrate the effectiveness of our TEN.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/LSP.2022.3214155</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0002-5559-8594</orcidid><orcidid>https://orcid.org/0000-0002-9108-737X</orcidid><orcidid>https://orcid.org/0000-0003-0939-0669</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1070-9908
ispartof	IEEE signal processing letters, 2022, Vol.29, p.1-5
issn	1070-9908 1558-2361
language	eng
recordid	cdi_proquest_journals_2731856918
source	IEEE Electronic Library (IEL)
subjects	Background noise Convolution Detectors Domain adaptation Domains Feature extraction Geometry Modules Object detection Performance degradation Scene text detection Semantics Training
title	Text Enhancement Network for Cross-domain Scene Text Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T14%3A14%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Text%20Enhancement%20Network%20for%20Cross-domain%20Scene%20Text%20Detection&rft.jtitle=IEEE%20signal%20processing%20letters&rft.au=Deng,%20Jinhong&rft.date=2022&rft.volume=29&rft.spage=1&rft.epage=5&rft.pages=1-5&rft.issn=1070-9908&rft.eissn=1558-2361&rft.coden=ISPLEM&rft_id=info:doi/10.1109/LSP.2022.3214155&rft_dat=%3Cproquest_RIE%3E2731856918%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2731856918&rft_id=info:pmid/&rft_ieee_id=9917319&rfr_iscdi=true