The PAGE (Page Analysis and Ground-Truth Elements) Format Framework

There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes P...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Pletschacher, S, Antonacopoulos, A
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 260
container_issue
container_start_page 257
container_title
container_volume
creator Pletschacher, S
Antonacopoulos, A
description There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.
doi_str_mv 10.1109/ICPR.2010.72
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5597587</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5597587</ieee_id><sourcerecordid>5597587</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1712-6a30bf9de2ff9b1c4aa2074774da811eb65c4d60820ffb1d1dbbba8381d090c73</originalsourceid><addsrcrecordid>eNo1js1LwzAchuMXWGdv3rzkqIfO_NKkSY-jtHUwsEg9j6RJXbEfknSM_fcW1NPDwwMvL0IPQNYAJH3ZZtX7mpJFBb1AYSokMMqY4AzYJQqojCESi16hu_9A6TUKgHCIWMLhFoXed5rQRCSCcx6grD5YXG3KHD9V6tPizaj6s-88VqPBpZuOo4lqd5wPOO_tYMfZP-NicoOaceHUYE-T-7pHN63qvQ3_uEIfRV5nr9Hurdxmm13UgAAaJSomuk2NpW2bamiYUpQsdwUzSgJYnfCGmYRIStpWgwGjtVYylmBIShoRr9Dj725nrd1_u25Q7rznPBVcivgH6ehNHQ</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>The PAGE (Page Analysis and Ground-Truth Elements) Format Framework</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Pletschacher, S ; Antonacopoulos, A</creator><creatorcontrib>Pletschacher, S ; Antonacopoulos, A</creatorcontrib><description>There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.</description><identifier>ISSN: 1051-4651</identifier><identifier>ISBN: 1424475422</identifier><identifier>ISBN: 9781424475421</identifier><identifier>EISSN: 2831-7475</identifier><identifier>EISBN: 9781424475414</identifier><identifier>EISBN: 9780769541099</identifier><identifier>EISBN: 1424475414</identifier><identifier>EISBN: 0769541097</identifier><identifier>DOI: 10.1109/ICPR.2010.72</identifier><language>eng</language><publisher>IEEE</publisher><subject>Document Analysis ; Joining processes ; Layout ; Optical character recognition software ; Page Representation Formats ; Performance evaluation ; Pipelines ; Text analysis ; XML</subject><ispartof>2010 20th International Conference on Pattern Recognition, 2010, p.257-260</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c1712-6a30bf9de2ff9b1c4aa2074774da811eb65c4d60820ffb1d1dbbba8381d090c73</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5597587$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5597587$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Pletschacher, S</creatorcontrib><creatorcontrib>Antonacopoulos, A</creatorcontrib><title>The PAGE (Page Analysis and Ground-Truth Elements) Format Framework</title><title>2010 20th International Conference on Pattern Recognition</title><addtitle>ICPR</addtitle><description>There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.</description><subject>Document Analysis</subject><subject>Joining processes</subject><subject>Layout</subject><subject>Optical character recognition software</subject><subject>Page Representation Formats</subject><subject>Performance evaluation</subject><subject>Pipelines</subject><subject>Text analysis</subject><subject>XML</subject><issn>1051-4651</issn><issn>2831-7475</issn><isbn>1424475422</isbn><isbn>9781424475421</isbn><isbn>9781424475414</isbn><isbn>9780769541099</isbn><isbn>1424475414</isbn><isbn>0769541097</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1js1LwzAchuMXWGdv3rzkqIfO_NKkSY-jtHUwsEg9j6RJXbEfknSM_fcW1NPDwwMvL0IPQNYAJH3ZZtX7mpJFBb1AYSokMMqY4AzYJQqojCESi16hu_9A6TUKgHCIWMLhFoXed5rQRCSCcx6grD5YXG3KHD9V6tPizaj6s-88VqPBpZuOo4lqd5wPOO_tYMfZP-NicoOaceHUYE-T-7pHN63qvQ3_uEIfRV5nr9Hurdxmm13UgAAaJSomuk2NpW2bamiYUpQsdwUzSgJYnfCGmYRIStpWgwGjtVYylmBIShoRr9Dj725nrd1_u25Q7rznPBVcivgH6ehNHQ</recordid><startdate>201008</startdate><enddate>201008</enddate><creator>Pletschacher, S</creator><creator>Antonacopoulos, A</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201008</creationdate><title>The PAGE (Page Analysis and Ground-Truth Elements) Format Framework</title><author>Pletschacher, S ; Antonacopoulos, A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1712-6a30bf9de2ff9b1c4aa2074774da811eb65c4d60820ffb1d1dbbba8381d090c73</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Document Analysis</topic><topic>Joining processes</topic><topic>Layout</topic><topic>Optical character recognition software</topic><topic>Page Representation Formats</topic><topic>Performance evaluation</topic><topic>Pipelines</topic><topic>Text analysis</topic><topic>XML</topic><toplevel>online_resources</toplevel><creatorcontrib>Pletschacher, S</creatorcontrib><creatorcontrib>Antonacopoulos, A</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pletschacher, S</au><au>Antonacopoulos, A</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>The PAGE (Page Analysis and Ground-Truth Elements) Format Framework</atitle><btitle>2010 20th International Conference on Pattern Recognition</btitle><stitle>ICPR</stitle><date>2010-08</date><risdate>2010</risdate><spage>257</spage><epage>260</epage><pages>257-260</pages><issn>1051-4651</issn><eissn>2831-7475</eissn><isbn>1424475422</isbn><isbn>9781424475421</isbn><eisbn>9781424475414</eisbn><eisbn>9780769541099</eisbn><eisbn>1424475414</eisbn><eisbn>0769541097</eisbn><abstract>There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.</abstract><pub>IEEE</pub><doi>10.1109/ICPR.2010.72</doi><tpages>4</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1051-4651
ispartof 2010 20th International Conference on Pattern Recognition, 2010, p.257-260
issn 1051-4651
2831-7475
language eng
recordid cdi_ieee_primary_5597587
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Document Analysis
Joining processes
Layout
Optical character recognition software
Page Representation Formats
Performance evaluation
Pipelines
Text analysis
XML
title The PAGE (Page Analysis and Ground-Truth Elements) Format Framework
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T05%3A08%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=The%20PAGE%20(Page%20Analysis%20and%20Ground-Truth%20Elements)%20Format%20Framework&rft.btitle=2010%2020th%20International%20Conference%20on%20Pattern%20Recognition&rft.au=Pletschacher,%20S&rft.date=2010-08&rft.spage=257&rft.epage=260&rft.pages=257-260&rft.issn=1051-4651&rft.eissn=2831-7475&rft.isbn=1424475422&rft.isbn_list=9781424475421&rft_id=info:doi/10.1109/ICPR.2010.72&rft_dat=%3Cieee_6IE%3E5597587%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424475414&rft.eisbn_list=9780769541099&rft.eisbn_list=1424475414&rft.eisbn_list=0769541097&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5597587&rfr_iscdi=true