Text Enhancement Network for Cross-domain Scene Text Detection

Conventional scene text detection approaches essentially assume that training and test data are drawn from the same distribution and have achieved compelling results. However, scene text detectors often suffer from performance degradation in real-world applications, since the feature distribution of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE signal processing letters 2022, Vol.29, p.1-5
Hauptverfasser: Deng, Jinhong, Luo, Xiulian, Zheng, Jiawen, Dang, Wanli, Li, Wen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 5
container_issue
container_start_page 1
container_title IEEE signal processing letters
container_volume 29
creator Deng, Jinhong
Luo, Xiulian
Zheng, Jiawen
Dang, Wanli
Li, Wen
description Conventional scene text detection approaches essentially assume that training and test data are drawn from the same distribution and have achieved compelling results. However, scene text detectors often suffer from performance degradation in real-world applications, since the feature distribution of training images is different from that of test images obtained from a new scene. To address the above problems, we propose a novel method called Text Enhancement Network (TEN) based on adversarial learning for cross-domain scene text detection. Specifically, we first design a Multi-adversarial Feature Alignment (MFA) module to maximally align features of the source and target data from low-level texture to high-level semantics. Second, we develop the Text Attention Enhancement (TAE) module to re-weigh the importance of text regions and accordingly enhance the corresponding features, in order to improve the robustness against noisy background. Additionally, we design a self-training strategy to further boost the performance of our TEN. We conduct extensive experiments on five benchmarks, and the experimental results demonstrate the effectiveness of our TEN.
doi_str_mv 10.1109/LSP.2022.3214155
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2731856918</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9917319</ieee_id><sourcerecordid>2731856918</sourcerecordid><originalsourceid>FETCH-LOGICAL-c357t-5f589626c860b0223492f0cbde4306f35c42e30263481e07680a8e3f5543a46b3</originalsourceid><addsrcrecordid>eNo9kM1LAzEQxYMoWKt3wcuC562TZJNNLoLU-gGLCq3nsE0nuNUmNZui_vemtniZmcN7M29-hJxTGFEK-qqZvowYMDbijFZUiAMyyFWVjEt6mGeoodQa1DE56fslACiqxIBcz_A7FRP_1nqLK_SpeML0FeJ74UIsxjH0fbkIq7bzxdSix-JPf4sJbeqCPyVHrv3o8Wzfh-T1bjIbP5TN8_3j-KYpLRd1KoUTSksmrZIwzyF5pZkDO19gxUE6LmzFkAOTvFIUoZYKWoXcCVHxtpJzPiSXu73rGD432CezDJvo80nDap4_kZqqrIKdym5zR3RmHbtVG38MBbOlZDIls6Vk9pSy5WJn6RDxX641zVs1_wW2fmCe</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2731856918</pqid></control><display><type>article</type><title>Text Enhancement Network for Cross-domain Scene Text Detection</title><source>IEEE Electronic Library (IEL)</source><creator>Deng, Jinhong ; Luo, Xiulian ; Zheng, Jiawen ; Dang, Wanli ; Li, Wen</creator><creatorcontrib>Deng, Jinhong ; Luo, Xiulian ; Zheng, Jiawen ; Dang, Wanli ; Li, Wen</creatorcontrib><description>Conventional scene text detection approaches essentially assume that training and test data are drawn from the same distribution and have achieved compelling results. However, scene text detectors often suffer from performance degradation in real-world applications, since the feature distribution of training images is different from that of test images obtained from a new scene. To address the above problems, we propose a novel method called Text Enhancement Network (TEN) based on adversarial learning for cross-domain scene text detection. Specifically, we first design a Multi-adversarial Feature Alignment (MFA) module to maximally align features of the source and target data from low-level texture to high-level semantics. Second, we develop the Text Attention Enhancement (TAE) module to re-weigh the importance of text regions and accordingly enhance the corresponding features, in order to improve the robustness against noisy background. Additionally, we design a self-training strategy to further boost the performance of our TEN. We conduct extensive experiments on five benchmarks, and the experimental results demonstrate the effectiveness of our TEN.</description><identifier>ISSN: 1070-9908</identifier><identifier>EISSN: 1558-2361</identifier><identifier>DOI: 10.1109/LSP.2022.3214155</identifier><identifier>CODEN: ISPLEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Background noise ; Convolution ; Detectors ; Domain adaptation ; Domains ; Feature extraction ; Geometry ; Modules ; Object detection ; Performance degradation ; Scene text detection ; Semantics ; Training</subject><ispartof>IEEE signal processing letters, 2022, Vol.29, p.1-5</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c357t-5f589626c860b0223492f0cbde4306f35c42e30263481e07680a8e3f5543a46b3</citedby><cites>FETCH-LOGICAL-c357t-5f589626c860b0223492f0cbde4306f35c42e30263481e07680a8e3f5543a46b3</cites><orcidid>0000-0002-5559-8594 ; 0000-0002-9108-737X ; 0000-0003-0939-0669</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9917319$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,4010,27900,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9917319$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Deng, Jinhong</creatorcontrib><creatorcontrib>Luo, Xiulian</creatorcontrib><creatorcontrib>Zheng, Jiawen</creatorcontrib><creatorcontrib>Dang, Wanli</creatorcontrib><creatorcontrib>Li, Wen</creatorcontrib><title>Text Enhancement Network for Cross-domain Scene Text Detection</title><title>IEEE signal processing letters</title><addtitle>LSP</addtitle><description>Conventional scene text detection approaches essentially assume that training and test data are drawn from the same distribution and have achieved compelling results. However, scene text detectors often suffer from performance degradation in real-world applications, since the feature distribution of training images is different from that of test images obtained from a new scene. To address the above problems, we propose a novel method called Text Enhancement Network (TEN) based on adversarial learning for cross-domain scene text detection. Specifically, we first design a Multi-adversarial Feature Alignment (MFA) module to maximally align features of the source and target data from low-level texture to high-level semantics. Second, we develop the Text Attention Enhancement (TAE) module to re-weigh the importance of text regions and accordingly enhance the corresponding features, in order to improve the robustness against noisy background. Additionally, we design a self-training strategy to further boost the performance of our TEN. We conduct extensive experiments on five benchmarks, and the experimental results demonstrate the effectiveness of our TEN.</description><subject>Background noise</subject><subject>Convolution</subject><subject>Detectors</subject><subject>Domain adaptation</subject><subject>Domains</subject><subject>Feature extraction</subject><subject>Geometry</subject><subject>Modules</subject><subject>Object detection</subject><subject>Performance degradation</subject><subject>Scene text detection</subject><subject>Semantics</subject><subject>Training</subject><issn>1070-9908</issn><issn>1558-2361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kM1LAzEQxYMoWKt3wcuC562TZJNNLoLU-gGLCq3nsE0nuNUmNZui_vemtniZmcN7M29-hJxTGFEK-qqZvowYMDbijFZUiAMyyFWVjEt6mGeoodQa1DE56fslACiqxIBcz_A7FRP_1nqLK_SpeML0FeJ74UIsxjH0fbkIq7bzxdSix-JPf4sJbeqCPyVHrv3o8Wzfh-T1bjIbP5TN8_3j-KYpLRd1KoUTSksmrZIwzyF5pZkDO19gxUE6LmzFkAOTvFIUoZYKWoXcCVHxtpJzPiSXu73rGD432CezDJvo80nDap4_kZqqrIKdym5zR3RmHbtVG38MBbOlZDIls6Vk9pSy5WJn6RDxX641zVs1_wW2fmCe</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Deng, Jinhong</creator><creator>Luo, Xiulian</creator><creator>Zheng, Jiawen</creator><creator>Dang, Wanli</creator><creator>Li, Wen</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5559-8594</orcidid><orcidid>https://orcid.org/0000-0002-9108-737X</orcidid><orcidid>https://orcid.org/0000-0003-0939-0669</orcidid></search><sort><creationdate>2022</creationdate><title>Text Enhancement Network for Cross-domain Scene Text Detection</title><author>Deng, Jinhong ; Luo, Xiulian ; Zheng, Jiawen ; Dang, Wanli ; Li, Wen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c357t-5f589626c860b0223492f0cbde4306f35c42e30263481e07680a8e3f5543a46b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Background noise</topic><topic>Convolution</topic><topic>Detectors</topic><topic>Domain adaptation</topic><topic>Domains</topic><topic>Feature extraction</topic><topic>Geometry</topic><topic>Modules</topic><topic>Object detection</topic><topic>Performance degradation</topic><topic>Scene text detection</topic><topic>Semantics</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Deng, Jinhong</creatorcontrib><creatorcontrib>Luo, Xiulian</creatorcontrib><creatorcontrib>Zheng, Jiawen</creatorcontrib><creatorcontrib>Dang, Wanli</creatorcontrib><creatorcontrib>Li, Wen</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE signal processing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Deng, Jinhong</au><au>Luo, Xiulian</au><au>Zheng, Jiawen</au><au>Dang, Wanli</au><au>Li, Wen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Text Enhancement Network for Cross-domain Scene Text Detection</atitle><jtitle>IEEE signal processing letters</jtitle><stitle>LSP</stitle><date>2022</date><risdate>2022</risdate><volume>29</volume><spage>1</spage><epage>5</epage><pages>1-5</pages><issn>1070-9908</issn><eissn>1558-2361</eissn><coden>ISPLEM</coden><abstract>Conventional scene text detection approaches essentially assume that training and test data are drawn from the same distribution and have achieved compelling results. However, scene text detectors often suffer from performance degradation in real-world applications, since the feature distribution of training images is different from that of test images obtained from a new scene. To address the above problems, we propose a novel method called Text Enhancement Network (TEN) based on adversarial learning for cross-domain scene text detection. Specifically, we first design a Multi-adversarial Feature Alignment (MFA) module to maximally align features of the source and target data from low-level texture to high-level semantics. Second, we develop the Text Attention Enhancement (TAE) module to re-weigh the importance of text regions and accordingly enhance the corresponding features, in order to improve the robustness against noisy background. Additionally, we design a self-training strategy to further boost the performance of our TEN. We conduct extensive experiments on five benchmarks, and the experimental results demonstrate the effectiveness of our TEN.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/LSP.2022.3214155</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0002-5559-8594</orcidid><orcidid>https://orcid.org/0000-0002-9108-737X</orcidid><orcidid>https://orcid.org/0000-0003-0939-0669</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1070-9908
ispartof IEEE signal processing letters, 2022, Vol.29, p.1-5
issn 1070-9908
1558-2361
language eng
recordid cdi_proquest_journals_2731856918
source IEEE Electronic Library (IEL)
subjects Background noise
Convolution
Detectors
Domain adaptation
Domains
Feature extraction
Geometry
Modules
Object detection
Performance degradation
Scene text detection
Semantics
Training
title Text Enhancement Network for Cross-domain Scene Text Detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T14%3A14%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Text%20Enhancement%20Network%20for%20Cross-domain%20Scene%20Text%20Detection&rft.jtitle=IEEE%20signal%20processing%20letters&rft.au=Deng,%20Jinhong&rft.date=2022&rft.volume=29&rft.spage=1&rft.epage=5&rft.pages=1-5&rft.issn=1070-9908&rft.eissn=1558-2361&rft.coden=ISPLEM&rft_id=info:doi/10.1109/LSP.2022.3214155&rft_dat=%3Cproquest_RIE%3E2731856918%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2731856918&rft_id=info:pmid/&rft_ieee_id=9917319&rfr_iscdi=true