The PAGE (Page Analysis and Ground-Truth Elements) Format Framework
There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes P...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 260 |
---|---|
container_issue | |
container_start_page | 257 |
container_title | |
container_volume | |
creator | Pletschacher, S Antonacopoulos, A |
description | There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series. |
doi_str_mv | 10.1109/ICPR.2010.72 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5597587</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5597587</ieee_id><sourcerecordid>5597587</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1712-6a30bf9de2ff9b1c4aa2074774da811eb65c4d60820ffb1d1dbbba8381d090c73</originalsourceid><addsrcrecordid>eNo1js1LwzAchuMXWGdv3rzkqIfO_NKkSY-jtHUwsEg9j6RJXbEfknSM_fcW1NPDwwMvL0IPQNYAJH3ZZtX7mpJFBb1AYSokMMqY4AzYJQqojCESi16hu_9A6TUKgHCIWMLhFoXed5rQRCSCcx6grD5YXG3KHD9V6tPizaj6s-88VqPBpZuOo4lqd5wPOO_tYMfZP-NicoOaceHUYE-T-7pHN63qvQ3_uEIfRV5nr9Hurdxmm13UgAAaJSomuk2NpW2bamiYUpQsdwUzSgJYnfCGmYRIStpWgwGjtVYylmBIShoRr9Dj725nrd1_u25Q7rznPBVcivgH6ehNHQ</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>The PAGE (Page Analysis and Ground-Truth Elements) Format Framework</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Pletschacher, S ; Antonacopoulos, A</creator><creatorcontrib>Pletschacher, S ; Antonacopoulos, A</creatorcontrib><description>There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.</description><identifier>ISSN: 1051-4651</identifier><identifier>ISBN: 1424475422</identifier><identifier>ISBN: 9781424475421</identifier><identifier>EISSN: 2831-7475</identifier><identifier>EISBN: 9781424475414</identifier><identifier>EISBN: 9780769541099</identifier><identifier>EISBN: 1424475414</identifier><identifier>EISBN: 0769541097</identifier><identifier>DOI: 10.1109/ICPR.2010.72</identifier><language>eng</language><publisher>IEEE</publisher><subject>Document Analysis ; Joining processes ; Layout ; Optical character recognition software ; Page Representation Formats ; Performance evaluation ; Pipelines ; Text analysis ; XML</subject><ispartof>2010 20th International Conference on Pattern Recognition, 2010, p.257-260</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c1712-6a30bf9de2ff9b1c4aa2074774da811eb65c4d60820ffb1d1dbbba8381d090c73</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5597587$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5597587$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Pletschacher, S</creatorcontrib><creatorcontrib>Antonacopoulos, A</creatorcontrib><title>The PAGE (Page Analysis and Ground-Truth Elements) Format Framework</title><title>2010 20th International Conference on Pattern Recognition</title><addtitle>ICPR</addtitle><description>There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.</description><subject>Document Analysis</subject><subject>Joining processes</subject><subject>Layout</subject><subject>Optical character recognition software</subject><subject>Page Representation Formats</subject><subject>Performance evaluation</subject><subject>Pipelines</subject><subject>Text analysis</subject><subject>XML</subject><issn>1051-4651</issn><issn>2831-7475</issn><isbn>1424475422</isbn><isbn>9781424475421</isbn><isbn>9781424475414</isbn><isbn>9780769541099</isbn><isbn>1424475414</isbn><isbn>0769541097</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1js1LwzAchuMXWGdv3rzkqIfO_NKkSY-jtHUwsEg9j6RJXbEfknSM_fcW1NPDwwMvL0IPQNYAJH3ZZtX7mpJFBb1AYSokMMqY4AzYJQqojCESi16hu_9A6TUKgHCIWMLhFoXed5rQRCSCcx6grD5YXG3KHD9V6tPizaj6s-88VqPBpZuOo4lqd5wPOO_tYMfZP-NicoOaceHUYE-T-7pHN63qvQ3_uEIfRV5nr9Hurdxmm13UgAAaJSomuk2NpW2bamiYUpQsdwUzSgJYnfCGmYRIStpWgwGjtVYylmBIShoRr9Dj725nrd1_u25Q7rznPBVcivgH6ehNHQ</recordid><startdate>201008</startdate><enddate>201008</enddate><creator>Pletschacher, S</creator><creator>Antonacopoulos, A</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201008</creationdate><title>The PAGE (Page Analysis and Ground-Truth Elements) Format Framework</title><author>Pletschacher, S ; Antonacopoulos, A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1712-6a30bf9de2ff9b1c4aa2074774da811eb65c4d60820ffb1d1dbbba8381d090c73</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Document Analysis</topic><topic>Joining processes</topic><topic>Layout</topic><topic>Optical character recognition software</topic><topic>Page Representation Formats</topic><topic>Performance evaluation</topic><topic>Pipelines</topic><topic>Text analysis</topic><topic>XML</topic><toplevel>online_resources</toplevel><creatorcontrib>Pletschacher, S</creatorcontrib><creatorcontrib>Antonacopoulos, A</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pletschacher, S</au><au>Antonacopoulos, A</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>The PAGE (Page Analysis and Ground-Truth Elements) Format Framework</atitle><btitle>2010 20th International Conference on Pattern Recognition</btitle><stitle>ICPR</stitle><date>2010-08</date><risdate>2010</risdate><spage>257</spage><epage>260</epage><pages>257-260</pages><issn>1051-4651</issn><eissn>2831-7475</eissn><isbn>1424475422</isbn><isbn>9781424475421</isbn><eisbn>9781424475414</eisbn><eisbn>9780769541099</eisbn><eisbn>1424475414</eisbn><eisbn>0769541097</eisbn><abstract>There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.</abstract><pub>IEEE</pub><doi>10.1109/ICPR.2010.72</doi><tpages>4</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1051-4651 |
ispartof | 2010 20th International Conference on Pattern Recognition, 2010, p.257-260 |
issn | 1051-4651 2831-7475 |
language | eng |
recordid | cdi_ieee_primary_5597587 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Document Analysis Joining processes Layout Optical character recognition software Page Representation Formats Performance evaluation Pipelines Text analysis XML |
title | The PAGE (Page Analysis and Ground-Truth Elements) Format Framework |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T05%3A08%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=The%20PAGE%20(Page%20Analysis%20and%20Ground-Truth%20Elements)%20Format%20Framework&rft.btitle=2010%2020th%20International%20Conference%20on%20Pattern%20Recognition&rft.au=Pletschacher,%20S&rft.date=2010-08&rft.spage=257&rft.epage=260&rft.pages=257-260&rft.issn=1051-4651&rft.eissn=2831-7475&rft.isbn=1424475422&rft.isbn_list=9781424475421&rft_id=info:doi/10.1109/ICPR.2010.72&rft_dat=%3Cieee_6IE%3E5597587%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424475414&rft.eisbn_list=9780769541099&rft.eisbn_list=1424475414&rft.eisbn_list=0769541097&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5597587&rfr_iscdi=true |