A Heuristic Approach for Converting HTML Documents to XML Documents
XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buchkapitel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1196 |
---|---|
container_issue | |
container_start_page | 1182 |
container_title | |
container_volume | 1861 |
creator | Lim, Seung-Jin Ng, Yiu-Kai |
description | XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document since these elements are designed for the display of data exclusively, but retain the character data of each element along with the implicit hierarchy among the data. The proposed conversion approach extracts the data hierarchy of HTML documents as closely as possible with no human intervention. The approach can be adopted to construct the data hierarchy of an HTML document and to collect data in HTML documents into an XML repository. |
doi_str_mv | 10.1007/3-540-44957-4_79 |
format | Book Chapter |
fullrecord | <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_1381473</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC3071603_85_1201</sourcerecordid><originalsourceid>FETCH-LOGICAL-p268t-6bc8c015b44fcf31b33bbd198ff0e1fd8f4d39270385d561c1f35aad3b1c36863</originalsourceid><addsrcrecordid>eNpNkDtPwzAQx81TVNCdMQNrii_n-OyxKo8iFbGAxGY5jg2BkgQ7ReLbkwJC3HLS_3HS_Rg7BT4Dzukc81LwXAhdUi4M6R021aRwFL812mUTkAA5otB7f54k0sT32YQjL3JNAg_ZhKQmgoLoiE1TeuHjYFGMiQlbzLOl38QmDY3L5n0fO-ues9DFbNG1Hz4OTfuULe9vV9lF5zZvvh1SNnTZ43_hhB0Eu05--ruP2cPV5f1ima_urm8W81XeF1INuaycchzKSojgAkKFWFU1aBUC9xBqFUSNuiCOqqxLCQ4CltbWWIFDqSQes7Ofu71Nzq5DtK1rkulj82bjpwFUIAjH2OwnlkanffLRVF33mgxwswVr0IyczDdEswU7FsTv3di9b3wajN823PhbtGv3bPvBx2SQE0iORpUGCg74BfpHdIo</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype><pqid>EBC3071603_85_1201</pqid></control><display><type>book_chapter</type><title>A Heuristic Approach for Converting HTML Documents to XML Documents</title><source>Springer Books</source><creator>Lim, Seung-Jin ; Ng, Yiu-Kai</creator><contributor>Pereira, Luis M ; Stuckey, Peter J ; Lau, Kung-Kiu ; Palamidessi, Catuscia ; Kerber, Manfred ; Lloyd, John ; Sagiv, Yehoshua ; Furbach, Ulrich ; Dahl, Veronica ; Lloyd, John ; Furbach, Ulrich ; Pereira, Luís Moniz ; Kerber, Manfred ; Palamidessi, Catuscia ; Dahl, Veronica ; Lau, Kung-Kiu ; Sagiv, Yehoshua ; Stuckey, Peter J.</contributor><creatorcontrib>Lim, Seung-Jin ; Ng, Yiu-Kai ; Pereira, Luis M ; Stuckey, Peter J ; Lau, Kung-Kiu ; Palamidessi, Catuscia ; Kerber, Manfred ; Lloyd, John ; Sagiv, Yehoshua ; Furbach, Ulrich ; Dahl, Veronica ; Lloyd, John ; Furbach, Ulrich ; Pereira, Luís Moniz ; Kerber, Manfred ; Palamidessi, Catuscia ; Dahl, Veronica ; Lau, Kung-Kiu ; Sagiv, Yehoshua ; Stuckey, Peter J.</creatorcontrib><description>XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document since these elements are designed for the display of data exclusively, but retain the character data of each element along with the implicit hierarchy among the data. The proposed conversion approach extracts the data hierarchy of HTML documents as closely as possible with no human intervention. The approach can be adopted to construct the data hierarchy of an HTML document and to collect data in HTML documents into an XML repository.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783540677970</identifier><identifier>ISBN: 3540677976</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 9783540449577</identifier><identifier>EISBN: 3540449574</identifier><identifier>DOI: 10.1007/3-540-44957-4_79</identifier><identifier>OCLC: 769771277</identifier><identifier>LCCallNum: Q334-342</identifier><language>eng</language><publisher>Germany: Springer Berlin / Heidelberg</publisher><subject>Applied sciences ; Artificial intelligence ; Computer science; control theory; systems ; Data Content ; Empty Element ; Empty String ; Exact sciences and technology ; Heuristic Approach ; Information systems. Data bases ; Leaf Node ; Learning and adaptive systems ; Memory organisation. Data processing ; Software</subject><ispartof>Computational Logic -- CL 2000, 2000, Vol.1861, p.1182-1196</ispartof><rights>Springer-Verlag Berlin Heidelberg 2000</rights><rights>2000 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/3071603-l.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/3-540-44957-4_79$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/3-540-44957-4_79$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,779,780,784,789,790,793,4048,4049,27924,38254,41441,42510</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=1381473$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Pereira, Luis M</contributor><contributor>Stuckey, Peter J</contributor><contributor>Lau, Kung-Kiu</contributor><contributor>Palamidessi, Catuscia</contributor><contributor>Kerber, Manfred</contributor><contributor>Lloyd, John</contributor><contributor>Sagiv, Yehoshua</contributor><contributor>Furbach, Ulrich</contributor><contributor>Dahl, Veronica</contributor><contributor>Lloyd, John</contributor><contributor>Furbach, Ulrich</contributor><contributor>Pereira, Luís Moniz</contributor><contributor>Kerber, Manfred</contributor><contributor>Palamidessi, Catuscia</contributor><contributor>Dahl, Veronica</contributor><contributor>Lau, Kung-Kiu</contributor><contributor>Sagiv, Yehoshua</contributor><contributor>Stuckey, Peter J.</contributor><creatorcontrib>Lim, Seung-Jin</creatorcontrib><creatorcontrib>Ng, Yiu-Kai</creatorcontrib><title>A Heuristic Approach for Converting HTML Documents to XML Documents</title><title>Computational Logic -- CL 2000</title><description>XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document since these elements are designed for the display of data exclusively, but retain the character data of each element along with the implicit hierarchy among the data. The proposed conversion approach extracts the data hierarchy of HTML documents as closely as possible with no human intervention. The approach can be adopted to construct the data hierarchy of an HTML document and to collect data in HTML documents into an XML repository.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Computer science; control theory; systems</subject><subject>Data Content</subject><subject>Empty Element</subject><subject>Empty String</subject><subject>Exact sciences and technology</subject><subject>Heuristic Approach</subject><subject>Information systems. Data bases</subject><subject>Leaf Node</subject><subject>Learning and adaptive systems</subject><subject>Memory organisation. Data processing</subject><subject>Software</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783540677970</isbn><isbn>3540677976</isbn><isbn>9783540449577</isbn><isbn>3540449574</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2000</creationdate><recordtype>book_chapter</recordtype><recordid>eNpNkDtPwzAQx81TVNCdMQNrii_n-OyxKo8iFbGAxGY5jg2BkgQ7ReLbkwJC3HLS_3HS_Rg7BT4Dzukc81LwXAhdUi4M6R021aRwFL812mUTkAA5otB7f54k0sT32YQjL3JNAg_ZhKQmgoLoiE1TeuHjYFGMiQlbzLOl38QmDY3L5n0fO-ues9DFbNG1Hz4OTfuULe9vV9lF5zZvvh1SNnTZ43_hhB0Eu05--ruP2cPV5f1ima_urm8W81XeF1INuaycchzKSojgAkKFWFU1aBUC9xBqFUSNuiCOqqxLCQ4CltbWWIFDqSQes7Ofu71Nzq5DtK1rkulj82bjpwFUIAjH2OwnlkanffLRVF33mgxwswVr0IyczDdEswU7FsTv3di9b3wajN823PhbtGv3bPvBx2SQE0iORpUGCg74BfpHdIo</recordid><startdate>2000</startdate><enddate>2000</enddate><creator>Lim, Seung-Jin</creator><creator>Ng, Yiu-Kai</creator><general>Springer Berlin / Heidelberg</general><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>FFUUA</scope><scope>IQODW</scope></search><sort><creationdate>2000</creationdate><title>A Heuristic Approach for Converting HTML Documents to XML Documents</title><author>Lim, Seung-Jin ; Ng, Yiu-Kai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p268t-6bc8c015b44fcf31b33bbd198ff0e1fd8f4d39270385d561c1f35aad3b1c36863</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2000</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Computer science; control theory; systems</topic><topic>Data Content</topic><topic>Empty Element</topic><topic>Empty String</topic><topic>Exact sciences and technology</topic><topic>Heuristic Approach</topic><topic>Information systems. Data bases</topic><topic>Leaf Node</topic><topic>Learning and adaptive systems</topic><topic>Memory organisation. Data processing</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lim, Seung-Jin</creatorcontrib><creatorcontrib>Ng, Yiu-Kai</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lim, Seung-Jin</au><au>Ng, Yiu-Kai</au><au>Pereira, Luis M</au><au>Stuckey, Peter J</au><au>Lau, Kung-Kiu</au><au>Palamidessi, Catuscia</au><au>Kerber, Manfred</au><au>Lloyd, John</au><au>Sagiv, Yehoshua</au><au>Furbach, Ulrich</au><au>Dahl, Veronica</au><au>Lloyd, John</au><au>Furbach, Ulrich</au><au>Pereira, Luís Moniz</au><au>Kerber, Manfred</au><au>Palamidessi, Catuscia</au><au>Dahl, Veronica</au><au>Lau, Kung-Kiu</au><au>Sagiv, Yehoshua</au><au>Stuckey, Peter J.</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>A Heuristic Approach for Converting HTML Documents to XML Documents</atitle><btitle>Computational Logic -- CL 2000</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2000</date><risdate>2000</risdate><volume>1861</volume><spage>1182</spage><epage>1196</epage><pages>1182-1196</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783540677970</isbn><isbn>3540677976</isbn><eisbn>9783540449577</eisbn><eisbn>3540449574</eisbn><abstract>XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document since these elements are designed for the display of data exclusively, but retain the character data of each element along with the implicit hierarchy among the data. The proposed conversion approach extracts the data hierarchy of HTML documents as closely as possible with no human intervention. The approach can be adopted to construct the data hierarchy of an HTML document and to collect data in HTML documents into an XML repository.</abstract><cop>Germany</cop><pub>Springer Berlin / Heidelberg</pub><doi>10.1007/3-540-44957-4_79</doi><oclcid>769771277</oclcid><tpages>15</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0302-9743 |
ispartof | Computational Logic -- CL 2000, 2000, Vol.1861, p.1182-1196 |
issn | 0302-9743 1611-3349 |
language | eng |
recordid | cdi_pascalfrancis_primary_1381473 |
source | Springer Books |
subjects | Applied sciences Artificial intelligence Computer science control theory systems Data Content Empty Element Empty String Exact sciences and technology Heuristic Approach Information systems. Data bases Leaf Node Learning and adaptive systems Memory organisation. Data processing Software |
title | A Heuristic Approach for Converting HTML Documents to XML Documents |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T02%3A06%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=A%20Heuristic%20Approach%20for%20Converting%20HTML%20Documents%20to%20XML%20Documents&rft.btitle=Computational%20Logic%20--%20CL%202000&rft.au=Lim,%20Seung-Jin&rft.date=2000&rft.volume=1861&rft.spage=1182&rft.epage=1196&rft.pages=1182-1196&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783540677970&rft.isbn_list=3540677976&rft_id=info:doi/10.1007/3-540-44957-4_79&rft_dat=%3Cproquest_pasca%3EEBC3071603_85_1201%3C/proquest_pasca%3E%3Curl%3E%3C/url%3E&rft.eisbn=9783540449577&rft.eisbn_list=3540449574&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC3071603_85_1201&rft_id=info:pmid/&rfr_iscdi=true |