Method and apparatus for structuring documents based on layout, content and collection

A method and apparatus is provided for converting a document in a first format essentially comprising a flat layout structure into a structured document in a hierarchical form in accordance with predetermined attributes identified from the input format. The process comprises fragmenting the input do...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: DEJEAN, HERVE, LUX, VERONIKA, RIBEAU, SANDRINE
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator DEJEAN, HERVE
LUX, VERONIKA
RIBEAU, SANDRINE
description A method and apparatus is provided for converting a document in a first format essentially comprising a flat layout structure into a structured document in a hierarchical form in accordance with predetermined attributes identified from the input format. The process comprises fragmenting the input document into a plurality of document content elements in accordance with a predetermined set of document attributes identifiable from the input document format. The content elements are clustered (16) into selective sets having similar document attributes. The clustered sets are validated (18) with reference to common textual properties organizational content common in documents in the collection. The clustered sets are then categorized (20) into predetermined categories comprising structured elements of the structured document format and the document content elements are organized (22) by hierarchical dependency from the predetermined categories wherein the organized document elements comprise the desired structured document format.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_EP1679625A3</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EP1679625A3</sourcerecordid><originalsourceid>FETCH-epo_espacenet_EP1679625A33</originalsourceid><addsrcrecordid>eNqNjDEKwkAQRbexEPUOcwAtNBixFInYCBZiG8bdSQysM8vObOHtDeIBLD6veI8_dfcL2VMCII9LCTNaUegkg1ou3koeuIcgvryITeGBSgGEIeJbii3BC9tovgdeYiRvg_DcTTqMSosfZw5Oze14XlGSljShJyZrm-u63u3rzfZQVX8kH9vGOZk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Method and apparatus for structuring documents based on layout, content and collection</title><source>esp@cenet</source><creator>DEJEAN, HERVE ; LUX, VERONIKA ; RIBEAU, SANDRINE</creator><creatorcontrib>DEJEAN, HERVE ; LUX, VERONIKA ; RIBEAU, SANDRINE</creatorcontrib><description>A method and apparatus is provided for converting a document in a first format essentially comprising a flat layout structure into a structured document in a hierarchical form in accordance with predetermined attributes identified from the input format. The process comprises fragmenting the input document into a plurality of document content elements in accordance with a predetermined set of document attributes identifiable from the input document format. The content elements are clustered (16) into selective sets having similar document attributes. The clustered sets are validated (18) with reference to common textual properties organizational content common in documents in the collection. The clustered sets are then categorized (20) into predetermined categories comprising structured elements of the structured document format and the document content elements are organized (22) by hierarchical dependency from the predetermined categories wherein the organized document elements comprise the desired structured document format.</description><language>eng ; fre ; ger</language><subject>CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; PHYSICS</subject><creationdate>2006</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20060726&amp;DB=EPODOC&amp;CC=EP&amp;NR=1679625A3$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20060726&amp;DB=EPODOC&amp;CC=EP&amp;NR=1679625A3$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>DEJEAN, HERVE</creatorcontrib><creatorcontrib>LUX, VERONIKA</creatorcontrib><creatorcontrib>RIBEAU, SANDRINE</creatorcontrib><title>Method and apparatus for structuring documents based on layout, content and collection</title><description>A method and apparatus is provided for converting a document in a first format essentially comprising a flat layout structure into a structured document in a hierarchical form in accordance with predetermined attributes identified from the input format. The process comprises fragmenting the input document into a plurality of document content elements in accordance with a predetermined set of document attributes identifiable from the input document format. The content elements are clustered (16) into selective sets having similar document attributes. The clustered sets are validated (18) with reference to common textual properties organizational content common in documents in the collection. The clustered sets are then categorized (20) into predetermined categories comprising structured elements of the structured document format and the document content elements are organized (22) by hierarchical dependency from the predetermined categories wherein the organized document elements comprise the desired structured document format.</description><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2006</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNjDEKwkAQRbexEPUOcwAtNBixFInYCBZiG8bdSQysM8vObOHtDeIBLD6veI8_dfcL2VMCII9LCTNaUegkg1ou3koeuIcgvryITeGBSgGEIeJbii3BC9tovgdeYiRvg_DcTTqMSosfZw5Oze14XlGSljShJyZrm-u63u3rzfZQVX8kH9vGOZk</recordid><startdate>20060726</startdate><enddate>20060726</enddate><creator>DEJEAN, HERVE</creator><creator>LUX, VERONIKA</creator><creator>RIBEAU, SANDRINE</creator><scope>EVB</scope></search><sort><creationdate>20060726</creationdate><title>Method and apparatus for structuring documents based on layout, content and collection</title><author>DEJEAN, HERVE ; LUX, VERONIKA ; RIBEAU, SANDRINE</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_EP1679625A33</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng ; fre ; ger</language><creationdate>2006</creationdate><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>DEJEAN, HERVE</creatorcontrib><creatorcontrib>LUX, VERONIKA</creatorcontrib><creatorcontrib>RIBEAU, SANDRINE</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>DEJEAN, HERVE</au><au>LUX, VERONIKA</au><au>RIBEAU, SANDRINE</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Method and apparatus for structuring documents based on layout, content and collection</title><date>2006-07-26</date><risdate>2006</risdate><abstract>A method and apparatus is provided for converting a document in a first format essentially comprising a flat layout structure into a structured document in a hierarchical form in accordance with predetermined attributes identified from the input format. The process comprises fragmenting the input document into a plurality of document content elements in accordance with a predetermined set of document attributes identifiable from the input document format. The content elements are clustered (16) into selective sets having similar document attributes. The clustered sets are validated (18) with reference to common textual properties organizational content common in documents in the collection. The clustered sets are then categorized (20) into predetermined categories comprising structured elements of the structured document format and the document content elements are organized (22) by hierarchical dependency from the predetermined categories wherein the organized document elements comprise the desired structured document format.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng ; fre ; ger
recordid cdi_epo_espacenet_EP1679625A3
source esp@cenet
subjects CALCULATING
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
PHYSICS
title Method and apparatus for structuring documents based on layout, content and collection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T18%3A13%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=DEJEAN,%20HERVE&rft.date=2006-07-26&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EEP1679625A3%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true