GMIF: A Gated Multi-Scale Input Feature Fusion Scheme for Scene Text Detection

The feature fusion of the multi-scale features plays a significant role in localizing text instances of different sizes in the scene text detection (STD) paradigm. The existing approaches are not sufficient to tackle the issues of multi-scale text; consequently, their performance also varies with th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2022, p.1-1
Hauptverfasser:	Ali, Tofik, Siddiqui, Mohammad F. H., Shahab, Sana, Roy, Partha P.
Format:	Artikel
Sprache:	eng
Schlagworte:	deep neural networks Electronic mail Feature extraction feature-fusion GSM Image segmentation multi-scale feature multi-scale text Object detection Real-time systems Scene text detection Testing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue
container_start_page	1
container_title	IEEE access
container_volume
creator	Ali, Tofik Siddiqui, Mohammad F. H. Shahab, Sana Roy, Partha P.
description	The feature fusion of the multi-scale features plays a significant role in localizing text instances of different sizes in the scene text detection (STD) paradigm. The existing approaches are not sufficient to tackle the issues of multi-scale text; consequently, their performance also varies with the text size. Here we propose a gated multi-scale input feature fusion (GMIF) approach to overcome this issue in STD. The GMIF generates the local features from down-scaled input images and propagates these features from low resolution to the higher resolution global features through a gated recurrent unit-like mechanism. The consistent performance of the GMIF is validated with different text instance sizes of the test-set of the Total-text dataset. The GMIF obtained the performance in range (Precision 88.554-89.106, Recall 85.452-85.790, and f-measures 87.072 - 87.417) with marginal deviation, whereas the current state-of-the-art method, DBNet++, acquired in range (Precision 73.005-82.666, Recall 80.912-87.274, and f-measures 76.755 - 84.183) with significant deviation. Besides this, GMIF also achieved the best performance (f-measures) over ICDAR 2015 (as 88.0), Total-Text (as 87.4), and the second-best over theMSRA-TD500 (as 85.2) dataset.We have conducted an ablation study to show the impact of different components of the GMIF on the STD tasks, which shows the effectiveness of the overall GMIF approach.
doi_str_mv	10.1109/ACCESS.2022.3203691
format	Article
fullrecord	<record><control><sourceid>ieee</sourceid><recordid>TN_cdi_ieee_primary_9874745</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9874745</ieee_id><sourcerecordid>9874745</sourcerecordid><originalsourceid>FETCH-ieee_primary_98747453</originalsourceid><addsrcrecordid>eNp9ibkOgkAURScmJhLlC2jeD4CzsNoRBKTABnoywUfEsAWGRP9eCmtPc0_uIcRg1GKMBucwiuKisDjl3BKcCjdgO6Jx5gamcIR7IPqyvOiGv12Op5F7mmfJBUJIpcIH5GunWrOoZYeQDdOqIEGp1hkhWZd2HKCon9gjNOO8KQ4IJb4VXFFhrbZ-IvtGdgvqvz0SI4nL6Ga2iFhNc9vL-VMFvmd7tiP-1y-Bajxi</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GMIF: A Gated Multi-Scale Input Feature Fusion Scheme for Scene Text Detection</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Ali, Tofik ; Siddiqui, Mohammad F. H. ; Shahab, Sana ; Roy, Partha P.</creator><creatorcontrib>Ali, Tofik ; Siddiqui, Mohammad F. H. ; Shahab, Sana ; Roy, Partha P.</creatorcontrib><description>The feature fusion of the multi-scale features plays a significant role in localizing text instances of different sizes in the scene text detection (STD) paradigm. The existing approaches are not sufficient to tackle the issues of multi-scale text; consequently, their performance also varies with the text size. Here we propose a gated multi-scale input feature fusion (GMIF) approach to overcome this issue in STD. The GMIF generates the local features from down-scaled input images and propagates these features from low resolution to the higher resolution global features through a gated recurrent unit-like mechanism. The consistent performance of the GMIF is validated with different text instance sizes of the test-set of the Total-text dataset. The GMIF obtained the performance in range (Precision 88.554-89.106, Recall 85.452-85.790, and f-measures 87.072 - 87.417) with marginal deviation, whereas the current state-of-the-art method, DBNet++, acquired in range (Precision 73.005-82.666, Recall 80.912-87.274, and f-measures 76.755 - 84.183) with significant deviation. Besides this, GMIF also achieved the best performance (f-measures) over ICDAR 2015 (as 88.0), Total-Text (as 87.4), and the second-best over theMSRA-TD500 (as 85.2) dataset.We have conducted an ablation study to show the impact of different components of the GMIF on the STD tasks, which shows the effectiveness of the overall GMIF approach.</description><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2022.3203691</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>IEEE</publisher><subject>deep neural networks ; Electronic mail ; Feature extraction ; feature-fusion ; GSM ; Image segmentation ; multi-scale feature ; multi-scale text ; Object detection ; Real-time systems ; Scene text detection ; Testing</subject><ispartof>IEEE access, 2022, p.1-1</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-5735-5254</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9874745$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,777,781,861,4010,27614,27904,27905,27906,54914</link.rule.ids></links><search><creatorcontrib>Ali, Tofik</creatorcontrib><creatorcontrib>Siddiqui, Mohammad F. H.</creatorcontrib><creatorcontrib>Shahab, Sana</creatorcontrib><creatorcontrib>Roy, Partha P.</creatorcontrib><title>GMIF: A Gated Multi-Scale Input Feature Fusion Scheme for Scene Text Detection</title><title>IEEE access</title><addtitle>Access</addtitle><description>The feature fusion of the multi-scale features plays a significant role in localizing text instances of different sizes in the scene text detection (STD) paradigm. The existing approaches are not sufficient to tackle the issues of multi-scale text; consequently, their performance also varies with the text size. Here we propose a gated multi-scale input feature fusion (GMIF) approach to overcome this issue in STD. The GMIF generates the local features from down-scaled input images and propagates these features from low resolution to the higher resolution global features through a gated recurrent unit-like mechanism. The consistent performance of the GMIF is validated with different text instance sizes of the test-set of the Total-text dataset. The GMIF obtained the performance in range (Precision 88.554-89.106, Recall 85.452-85.790, and f-measures 87.072 - 87.417) with marginal deviation, whereas the current state-of-the-art method, DBNet++, acquired in range (Precision 73.005-82.666, Recall 80.912-87.274, and f-measures 76.755 - 84.183) with significant deviation. Besides this, GMIF also achieved the best performance (f-measures) over ICDAR 2015 (as 88.0), Total-Text (as 87.4), and the second-best over theMSRA-TD500 (as 85.2) dataset.We have conducted an ablation study to show the impact of different components of the GMIF on the STD tasks, which shows the effectiveness of the overall GMIF approach.</description><subject>deep neural networks</subject><subject>Electronic mail</subject><subject>Feature extraction</subject><subject>feature-fusion</subject><subject>GSM</subject><subject>Image segmentation</subject><subject>multi-scale feature</subject><subject>multi-scale text</subject><subject>Object detection</subject><subject>Real-time systems</subject><subject>Scene text detection</subject><subject>Testing</subject><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><recordid>eNp9ibkOgkAURScmJhLlC2jeD4CzsNoRBKTABnoywUfEsAWGRP9eCmtPc0_uIcRg1GKMBucwiuKisDjl3BKcCjdgO6Jx5gamcIR7IPqyvOiGv12Op5F7mmfJBUJIpcIH5GunWrOoZYeQDdOqIEGp1hkhWZd2HKCon9gjNOO8KQ4IJb4VXFFhrbZ-IvtGdgvqvz0SI4nL6Ga2iFhNc9vL-VMFvmd7tiP-1y-Bajxi</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Ali, Tofik</creator><creator>Siddiqui, Mohammad F. H.</creator><creator>Shahab, Sana</creator><creator>Roy, Partha P.</creator><general>IEEE</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><orcidid>https://orcid.org/0000-0002-5735-5254</orcidid></search><sort><creationdate>2022</creationdate><title>GMIF: A Gated Multi-Scale Input Feature Fusion Scheme for Scene Text Detection</title><author>Ali, Tofik ; Siddiqui, Mohammad F. H. ; Shahab, Sana ; Roy, Partha P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_98747453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>deep neural networks</topic><topic>Electronic mail</topic><topic>Feature extraction</topic><topic>feature-fusion</topic><topic>GSM</topic><topic>Image segmentation</topic><topic>multi-scale feature</topic><topic>multi-scale text</topic><topic>Object detection</topic><topic>Real-time systems</topic><topic>Scene text detection</topic><topic>Testing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ali, Tofik</creatorcontrib><creatorcontrib>Siddiqui, Mohammad F. H.</creatorcontrib><creatorcontrib>Shahab, Sana</creatorcontrib><creatorcontrib>Roy, Partha P.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ali, Tofik</au><au>Siddiqui, Mohammad F. H.</au><au>Shahab, Sana</au><au>Roy, Partha P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GMIF: A Gated Multi-Scale Input Feature Fusion Scheme for Scene Text Detection</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2022</date><risdate>2022</risdate><spage>1</spage><epage>1</epage><pages>1-1</pages><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>The feature fusion of the multi-scale features plays a significant role in localizing text instances of different sizes in the scene text detection (STD) paradigm. The existing approaches are not sufficient to tackle the issues of multi-scale text; consequently, their performance also varies with the text size. Here we propose a gated multi-scale input feature fusion (GMIF) approach to overcome this issue in STD. The GMIF generates the local features from down-scaled input images and propagates these features from low resolution to the higher resolution global features through a gated recurrent unit-like mechanism. The consistent performance of the GMIF is validated with different text instance sizes of the test-set of the Total-text dataset. The GMIF obtained the performance in range (Precision 88.554-89.106, Recall 85.452-85.790, and f-measures 87.072 - 87.417) with marginal deviation, whereas the current state-of-the-art method, DBNet++, acquired in range (Precision 73.005-82.666, Recall 80.912-87.274, and f-measures 76.755 - 84.183) with significant deviation. Besides this, GMIF also achieved the best performance (f-measures) over ICDAR 2015 (as 88.0), Total-Text (as 87.4), and the second-best over theMSRA-TD500 (as 85.2) dataset.We have conducted an ablation study to show the impact of different components of the GMIF on the STD tasks, which shows the effectiveness of the overall GMIF approach.</abstract><pub>IEEE</pub><doi>10.1109/ACCESS.2022.3203691</doi><orcidid>https://orcid.org/0000-0002-5735-5254</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2169-3536
ispartof	IEEE access, 2022, p.1-1
issn	2169-3536
language	eng
recordid	cdi_ieee_primary_9874745
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects	deep neural networks Electronic mail Feature extraction feature-fusion GSM Image segmentation multi-scale feature multi-scale text Object detection Real-time systems Scene text detection Testing
title	GMIF: A Gated Multi-Scale Input Feature Fusion Scheme for Scene Text Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T01%3A53%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GMIF:%20A%20Gated%20Multi-Scale%20Input%20Feature%20Fusion%20Scheme%20for%20Scene%20Text%20Detection&rft.jtitle=IEEE%20access&rft.au=Ali,%20Tofik&rft.date=2022&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2022.3203691&rft_dat=%3Cieee%3E9874745%3C/ieee%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9874745&rfr_iscdi=true