A neural model for text localization, transcription and named entity recognition in full pages

•The network localizes, transcribes and recognizes named entities in full page images.•The model benefits from task interdependence and bi-dimensional structure.•Exhaustive valuation on mixed printed and handwritten documents. In the last years, the consolidation of deep neural network architectures...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition letters 2020-08, Vol.136, p.219-227
Hauptverfasser:	Carbonell, Manuel, Fornés, Alicia, Villegas, Mauricio, Lladós, Josep
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Computer architecture Deep neural networks Document image analysis Handwriting recognition Handwritten text recognition Information extraction Information retrieval Localization Multi-task learning Named entity recognition Neural networks Object recognition Text detection Transcription
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	227
container_issue
container_start_page	219
container_title	Pattern recognition letters
container_volume	136
creator	Carbonell, Manuel Fornés, Alicia Villegas, Mauricio Lladós, Josep
description	•The network localizes, transcribes and recognizes named entities in full page images.•The model benefits from task interdependence and bi-dimensional structure.•Exhaustive valuation on mixed printed and handwritten documents. In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propose an end-to-end model that combines a one stage object detection network with branches for the recognition of text and named entities respectively in a way that shared features can be learned simultaneously from the training error of each of the tasks. By doing so the model jointly performs handwritten text detection, transcription, and named entity recognition at page level with a single feed forward step. We exhaustively evaluate our approach on different datasets, discussing its advantages and limitations compared to sequential approaches. The results show that the model is capable of benefiting from shared features by simultaneously solving interdependent tasks.
doi_str_mv	10.1016/j.patrec.2020.05.001
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2447010310</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167865520301719</els_id><sourcerecordid>2447010310</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-c6eaf2beea043a3ed31da9caa3cc3889d698c8e47cf83e7a8d09eef8ad075d8d3</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-Aw8Br7ZOm36kF2ERv2DBi14NYzJdsnSTmmRF_fV2Xc_CwDDM-87LPIydF5AXUDRX63zEFEjnJZSQQ50DFAdsVsi2zFpRVYdsNsnaTDZ1fcxOYlwDQCM6OWOvC-5oG3DgG29o4L0PPNFn4oPXONhvTNa7S54CuqiDHXcjR2e4ww0ZTi7Z9MWnbL9y9ndpHe-3w8BHXFE8ZUc9DpHO_vqcvdzdPt88ZMun-8ebxTLTQlQp0w1hX74RIVQCBRlRGOw0otBaSNmZppNaUtXqXgpqURroiHqJBtraSCPm7GJ_dwz-fUsxqbXfBjdFqrKqWihATDVn1V6lg48xUK_GYDcYvlQBakdSrdWepNqRVFCrieRku97baPrgw1JQUVtymoydpEkZb_8_8AOIyoC0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2447010310</pqid></control><display><type>article</type><title>A neural model for text localization, transcription and named entity recognition in full pages</title><source>Elsevier ScienceDirect Journals</source><creator>Carbonell, Manuel ; Fornés, Alicia ; Villegas, Mauricio ; Lladós, Josep</creator><creatorcontrib>Carbonell, Manuel ; Fornés, Alicia ; Villegas, Mauricio ; Lladós, Josep</creatorcontrib><description>•The network localizes, transcribes and recognizes named entities in full page images.•The model benefits from task interdependence and bi-dimensional structure.•Exhaustive valuation on mixed printed and handwritten documents. In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propose an end-to-end model that combines a one stage object detection network with branches for the recognition of text and named entities respectively in a way that shared features can be learned simultaneously from the training error of each of the tasks. By doing so the model jointly performs handwritten text detection, transcription, and named entity recognition at page level with a single feed forward step. We exhaustively evaluate our approach on different datasets, discussing its advantages and limitations compared to sequential approaches. The results show that the model is capable of benefiting from shared features by simultaneously solving interdependent tasks.</description><identifier>ISSN: 0167-8655</identifier><identifier>EISSN: 1872-7344</identifier><identifier>DOI: 10.1016/j.patrec.2020.05.001</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Artificial neural networks ; Computer architecture ; Deep neural networks ; Document image analysis ; Handwriting recognition ; Handwritten text recognition ; Information extraction ; Information retrieval ; Localization ; Multi-task learning ; Named entity recognition ; Neural networks ; Object recognition ; Text detection ; Transcription</subject><ispartof>Pattern recognition letters, 2020-08, Vol.136, p.219-227</ispartof><rights>2020 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Aug 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-c6eaf2beea043a3ed31da9caa3cc3889d698c8e47cf83e7a8d09eef8ad075d8d3</citedby><cites>FETCH-LOGICAL-c334t-c6eaf2beea043a3ed31da9caa3cc3889d698c8e47cf83e7a8d09eef8ad075d8d3</cites><orcidid>0000-0003-4362-3649 ; 0000-0002-4533-4739</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.patrec.2020.05.001$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3536,27903,27904,45974</link.rule.ids></links><search><creatorcontrib>Carbonell, Manuel</creatorcontrib><creatorcontrib>Fornés, Alicia</creatorcontrib><creatorcontrib>Villegas, Mauricio</creatorcontrib><creatorcontrib>Lladós, Josep</creatorcontrib><title>A neural model for text localization, transcription and named entity recognition in full pages</title><title>Pattern recognition letters</title><description>•The network localizes, transcribes and recognizes named entities in full page images.•The model benefits from task interdependence and bi-dimensional structure.•Exhaustive valuation on mixed printed and handwritten documents. In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propose an end-to-end model that combines a one stage object detection network with branches for the recognition of text and named entities respectively in a way that shared features can be learned simultaneously from the training error of each of the tasks. By doing so the model jointly performs handwritten text detection, transcription, and named entity recognition at page level with a single feed forward step. We exhaustively evaluate our approach on different datasets, discussing its advantages and limitations compared to sequential approaches. The results show that the model is capable of benefiting from shared features by simultaneously solving interdependent tasks.</description><subject>Artificial neural networks</subject><subject>Computer architecture</subject><subject>Deep neural networks</subject><subject>Document image analysis</subject><subject>Handwriting recognition</subject><subject>Handwritten text recognition</subject><subject>Information extraction</subject><subject>Information retrieval</subject><subject>Localization</subject><subject>Multi-task learning</subject><subject>Named entity recognition</subject><subject>Neural networks</subject><subject>Object recognition</subject><subject>Text detection</subject><subject>Transcription</subject><issn>0167-8655</issn><issn>1872-7344</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-Aw8Br7ZOm36kF2ERv2DBi14NYzJdsnSTmmRF_fV2Xc_CwDDM-87LPIydF5AXUDRX63zEFEjnJZSQQ50DFAdsVsi2zFpRVYdsNsnaTDZ1fcxOYlwDQCM6OWOvC-5oG3DgG29o4L0PPNFn4oPXONhvTNa7S54CuqiDHXcjR2e4ww0ZTi7Z9MWnbL9y9ndpHe-3w8BHXFE8ZUc9DpHO_vqcvdzdPt88ZMun-8ebxTLTQlQp0w1hX74RIVQCBRlRGOw0otBaSNmZppNaUtXqXgpqURroiHqJBtraSCPm7GJ_dwz-fUsxqbXfBjdFqrKqWihATDVn1V6lg48xUK_GYDcYvlQBakdSrdWepNqRVFCrieRku97baPrgw1JQUVtymoydpEkZb_8_8AOIyoC0</recordid><startdate>202008</startdate><enddate>202008</enddate><creator>Carbonell, Manuel</creator><creator>Fornés, Alicia</creator><creator>Villegas, Mauricio</creator><creator>Lladós, Josep</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TK</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4362-3649</orcidid><orcidid>https://orcid.org/0000-0002-4533-4739</orcidid></search><sort><creationdate>202008</creationdate><title>A neural model for text localization, transcription and named entity recognition in full pages</title><author>Carbonell, Manuel ; Fornés, Alicia ; Villegas, Mauricio ; Lladós, Josep</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-c6eaf2beea043a3ed31da9caa3cc3889d698c8e47cf83e7a8d09eef8ad075d8d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Artificial neural networks</topic><topic>Computer architecture</topic><topic>Deep neural networks</topic><topic>Document image analysis</topic><topic>Handwriting recognition</topic><topic>Handwritten text recognition</topic><topic>Information extraction</topic><topic>Information retrieval</topic><topic>Localization</topic><topic>Multi-task learning</topic><topic>Named entity recognition</topic><topic>Neural networks</topic><topic>Object recognition</topic><topic>Text detection</topic><topic>Transcription</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Carbonell, Manuel</creatorcontrib><creatorcontrib>Fornés, Alicia</creatorcontrib><creatorcontrib>Villegas, Mauricio</creatorcontrib><creatorcontrib>Lladós, Josep</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Pattern recognition letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Carbonell, Manuel</au><au>Fornés, Alicia</au><au>Villegas, Mauricio</au><au>Lladós, Josep</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A neural model for text localization, transcription and named entity recognition in full pages</atitle><jtitle>Pattern recognition letters</jtitle><date>2020-08</date><risdate>2020</risdate><volume>136</volume><spage>219</spage><epage>227</epage><pages>219-227</pages><issn>0167-8655</issn><eissn>1872-7344</eissn><abstract>•The network localizes, transcribes and recognizes named entities in full page images.•The model benefits from task interdependence and bi-dimensional structure.•Exhaustive valuation on mixed printed and handwritten documents. In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propose an end-to-end model that combines a one stage object detection network with branches for the recognition of text and named entities respectively in a way that shared features can be learned simultaneously from the training error of each of the tasks. By doing so the model jointly performs handwritten text detection, transcription, and named entity recognition at page level with a single feed forward step. We exhaustively evaluate our approach on different datasets, discussing its advantages and limitations compared to sequential approaches. The results show that the model is capable of benefiting from shared features by simultaneously solving interdependent tasks.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.patrec.2020.05.001</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0003-4362-3649</orcidid><orcidid>https://orcid.org/0000-0002-4533-4739</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0167-8655
ispartof	Pattern recognition letters, 2020-08, Vol.136, p.219-227
issn	0167-8655 1872-7344
language	eng
recordid	cdi_proquest_journals_2447010310
source	Elsevier ScienceDirect Journals
subjects	Artificial neural networks Computer architecture Deep neural networks Document image analysis Handwriting recognition Handwritten text recognition Information extraction Information retrieval Localization Multi-task learning Named entity recognition Neural networks Object recognition Text detection Transcription
title	A neural model for text localization, transcription and named entity recognition in full pages
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T08%3A31%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20neural%20model%20for%20text%20localization,%20transcription%20and%20named%20entity%20recognition%20in%20full%20pages&rft.jtitle=Pattern%20recognition%20letters&rft.au=Carbonell,%20Manuel&rft.date=2020-08&rft.volume=136&rft.spage=219&rft.epage=227&rft.pages=219-227&rft.issn=0167-8655&rft.eissn=1872-7344&rft_id=info:doi/10.1016/j.patrec.2020.05.001&rft_dat=%3Cproquest_cross%3E2447010310%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2447010310&rft_id=info:pmid/&rft_els_id=S0167865520301719&rfr_iscdi=true