Multi‐lingual text detection and identification using agile convolutional neural network

Multi‐lingual scene text detection and identification is a challenging task in today's world due to the prevalence of many digitized multi‐lingual documents, images, and videos. A valuable method for detecting multi‐lingual text from natural scene images is proposed which uses the convolutional...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computational intelligence 2021-11, Vol.37 (4), p.1803-1826
Hauptverfasser:	Yegnaraman, Aparna, Valli, S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Aspect ratio atrous separable convolution complete IoU loss Feature maps multi‐lingual text identification Neural networks non‐maximal suppression scene text detection You Only Look Once
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1826
container_issue	4
container_start_page	1803
container_title	Computational intelligence
container_volume	37
creator	Yegnaraman, Aparna Valli, S.
description	Multi‐lingual scene text detection and identification is a challenging task in today's world due to the prevalence of many digitized multi‐lingual documents, images, and videos. A valuable method for detecting multi‐lingual text from natural scene images is proposed which uses the convolutional neural network, namely, You Only Look Once (YOLOv3) as the backbone. The proposed system is more agile than YOLOv3 with the introduction of atrous separable convolution (ASC). The multi‐scale prediction in YOLOv3 emphasizes the integration of global features of multi‐scale convolutional layers while it overlooks the blend of the multi‐scale local region features on the same convolutional layer. To overcome this, ASC is applied to efficiently compute dense local region feature maps, thereby reducing computation complexity substantially. Complete IoU loss, which is an accumulation of overlap area, distance, and aspect ratio, is introduced for enhanced accuracy in bounding box regression, wherein IoU designates the measure of overlap between the predicted and the ground truth bounding boxes. The experimental results show that the proposed system is efficacious in detecting multi‐lingual as well as English text from natural scene images.
doi_str_mv	10.1111/coin.12467
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2602899661</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2602899661</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3017-2c6a77939a7c6748033882e83527f52f59632ccab5bf4ec7bc18e9e8c50114a83</originalsourceid><addsrcrecordid>eNp9kM1KAzEUhYMoWKsbn2DAnTA1f5NkllL8KVS70Y2bkKaZkjpOan6s3fkIPqNPYtpx7d0cOHzncu8B4BzBEcpzpZ3tRghTxg_AAGUpBaPwEAygwLTkNamOwUkIKwghIlQMwMtDaqP9-fpubbdMqi2i-YzFwkSjo3VdobpFYRemi7axWu2tFDJaqKVtTaFd9-HatPNztjPJ7yVunH89BUeNaoM5-9MheL69eRrfl9PZ3WR8PS01gYiXWDPF82W14ppxKiAhQmAjSIV5U-GmqhnBWqt5NW-o0XyukTC1EbqCCFElyBBc9HvX3r0nE6JcueTzPUFiBrGoa8ZQpi57SnsXgjeNXHv7pvxWIih33cldd3LfXYZRD2_yk9t_SDmeTR77zC-FRXNR</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2602899661</pqid></control><display><type>article</type><title>Multi‐lingual text detection and identification using agile convolutional neural network</title><source>EBSCOhost Business Source Complete</source><source>Access via Wiley Online Library</source><creator>Yegnaraman, Aparna ; Valli, S.</creator><creatorcontrib>Yegnaraman, Aparna ; Valli, S.</creatorcontrib><description>Multi‐lingual scene text detection and identification is a challenging task in today's world due to the prevalence of many digitized multi‐lingual documents, images, and videos. A valuable method for detecting multi‐lingual text from natural scene images is proposed which uses the convolutional neural network, namely, You Only Look Once (YOLOv3) as the backbone. The proposed system is more agile than YOLOv3 with the introduction of atrous separable convolution (ASC). The multi‐scale prediction in YOLOv3 emphasizes the integration of global features of multi‐scale convolutional layers while it overlooks the blend of the multi‐scale local region features on the same convolutional layer. To overcome this, ASC is applied to efficiently compute dense local region feature maps, thereby reducing computation complexity substantially. Complete IoU loss, which is an accumulation of overlap area, distance, and aspect ratio, is introduced for enhanced accuracy in bounding box regression, wherein IoU designates the measure of overlap between the predicted and the ground truth bounding boxes. The experimental results show that the proposed system is efficacious in detecting multi‐lingual as well as English text from natural scene images.</description><identifier>ISSN: 0824-7935</identifier><identifier>EISSN: 1467-8640</identifier><identifier>DOI: 10.1111/coin.12467</identifier><language>eng</language><publisher>Hoboken: Blackwell Publishing Ltd</publisher><subject>Artificial neural networks ; Aspect ratio ; atrous separable convolution ; complete IoU loss ; Feature maps ; multi‐lingual text identification ; Neural networks ; non‐maximal suppression ; scene text detection ; You Only Look Once</subject><ispartof>Computational intelligence, 2021-11, Vol.37 (4), p.1803-1826</ispartof><rights>2021 Wiley Periodicals LLC.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3017-2c6a77939a7c6748033882e83527f52f59632ccab5bf4ec7bc18e9e8c50114a83</citedby><cites>FETCH-LOGICAL-c3017-2c6a77939a7c6748033882e83527f52f59632ccab5bf4ec7bc18e9e8c50114a83</cites><orcidid>0000-0002-5759-7851</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fcoin.12467$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fcoin.12467$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids></links><search><creatorcontrib>Yegnaraman, Aparna</creatorcontrib><creatorcontrib>Valli, S.</creatorcontrib><title>Multi‐lingual text detection and identification using agile convolutional neural network</title><title>Computational intelligence</title><description>Multi‐lingual scene text detection and identification is a challenging task in today's world due to the prevalence of many digitized multi‐lingual documents, images, and videos. A valuable method for detecting multi‐lingual text from natural scene images is proposed which uses the convolutional neural network, namely, You Only Look Once (YOLOv3) as the backbone. The proposed system is more agile than YOLOv3 with the introduction of atrous separable convolution (ASC). The multi‐scale prediction in YOLOv3 emphasizes the integration of global features of multi‐scale convolutional layers while it overlooks the blend of the multi‐scale local region features on the same convolutional layer. To overcome this, ASC is applied to efficiently compute dense local region feature maps, thereby reducing computation complexity substantially. Complete IoU loss, which is an accumulation of overlap area, distance, and aspect ratio, is introduced for enhanced accuracy in bounding box regression, wherein IoU designates the measure of overlap between the predicted and the ground truth bounding boxes. The experimental results show that the proposed system is efficacious in detecting multi‐lingual as well as English text from natural scene images.</description><subject>Artificial neural networks</subject><subject>Aspect ratio</subject><subject>atrous separable convolution</subject><subject>complete IoU loss</subject><subject>Feature maps</subject><subject>multi‐lingual text identification</subject><subject>Neural networks</subject><subject>non‐maximal suppression</subject><subject>scene text detection</subject><subject>You Only Look Once</subject><issn>0824-7935</issn><issn>1467-8640</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kM1KAzEUhYMoWKsbn2DAnTA1f5NkllL8KVS70Y2bkKaZkjpOan6s3fkIPqNPYtpx7d0cOHzncu8B4BzBEcpzpZ3tRghTxg_AAGUpBaPwEAygwLTkNamOwUkIKwghIlQMwMtDaqP9-fpubbdMqi2i-YzFwkSjo3VdobpFYRemi7axWu2tFDJaqKVtTaFd9-HatPNztjPJ7yVunH89BUeNaoM5-9MheL69eRrfl9PZ3WR8PS01gYiXWDPF82W14ppxKiAhQmAjSIV5U-GmqhnBWqt5NW-o0XyukTC1EbqCCFElyBBc9HvX3r0nE6JcueTzPUFiBrGoa8ZQpi57SnsXgjeNXHv7pvxWIih33cldd3LfXYZRD2_yk9t_SDmeTR77zC-FRXNR</recordid><startdate>202111</startdate><enddate>202111</enddate><creator>Yegnaraman, Aparna</creator><creator>Valli, S.</creator><general>Blackwell Publishing Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5759-7851</orcidid></search><sort><creationdate>202111</creationdate><title>Multi‐lingual text detection and identification using agile convolutional neural network</title><author>Yegnaraman, Aparna ; Valli, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3017-2c6a77939a7c6748033882e83527f52f59632ccab5bf4ec7bc18e9e8c50114a83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial neural networks</topic><topic>Aspect ratio</topic><topic>atrous separable convolution</topic><topic>complete IoU loss</topic><topic>Feature maps</topic><topic>multi‐lingual text identification</topic><topic>Neural networks</topic><topic>non‐maximal suppression</topic><topic>scene text detection</topic><topic>You Only Look Once</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yegnaraman, Aparna</creatorcontrib><creatorcontrib>Valli, S.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computational intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yegnaraman, Aparna</au><au>Valli, S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi‐lingual text detection and identification using agile convolutional neural network</atitle><jtitle>Computational intelligence</jtitle><date>2021-11</date><risdate>2021</risdate><volume>37</volume><issue>4</issue><spage>1803</spage><epage>1826</epage><pages>1803-1826</pages><issn>0824-7935</issn><eissn>1467-8640</eissn><abstract>Multi‐lingual scene text detection and identification is a challenging task in today's world due to the prevalence of many digitized multi‐lingual documents, images, and videos. A valuable method for detecting multi‐lingual text from natural scene images is proposed which uses the convolutional neural network, namely, You Only Look Once (YOLOv3) as the backbone. The proposed system is more agile than YOLOv3 with the introduction of atrous separable convolution (ASC). The multi‐scale prediction in YOLOv3 emphasizes the integration of global features of multi‐scale convolutional layers while it overlooks the blend of the multi‐scale local region features on the same convolutional layer. To overcome this, ASC is applied to efficiently compute dense local region feature maps, thereby reducing computation complexity substantially. Complete IoU loss, which is an accumulation of overlap area, distance, and aspect ratio, is introduced for enhanced accuracy in bounding box regression, wherein IoU designates the measure of overlap between the predicted and the ground truth bounding boxes. The experimental results show that the proposed system is efficacious in detecting multi‐lingual as well as English text from natural scene images.</abstract><cop>Hoboken</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/coin.12467</doi><tpages>24</tpages><orcidid>https://orcid.org/0000-0002-5759-7851</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0824-7935
ispartof	Computational intelligence, 2021-11, Vol.37 (4), p.1803-1826
issn	0824-7935 1467-8640
language	eng
recordid	cdi_proquest_journals_2602899661
source	EBSCOhost Business Source Complete; Access via Wiley Online Library
subjects	Artificial neural networks Aspect ratio atrous separable convolution complete IoU loss Feature maps multi‐lingual text identification Neural networks non‐maximal suppression scene text detection You Only Look Once
title	Multi‐lingual text detection and identification using agile convolutional neural network
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T07%3A40%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi%E2%80%90lingual%20text%20detection%20and%20identification%20using%20agile%20convolutional%20neural%20network&rft.jtitle=Computational%20intelligence&rft.au=Yegnaraman,%20Aparna&rft.date=2021-11&rft.volume=37&rft.issue=4&rft.spage=1803&rft.epage=1826&rft.pages=1803-1826&rft.issn=0824-7935&rft.eissn=1467-8640&rft_id=info:doi/10.1111/coin.12467&rft_dat=%3Cproquest_cross%3E2602899661%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2602899661&rft_id=info:pmid/&rfr_iscdi=true