A Heuristic Approach for Converting HTML Documents to XML Documents

XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lim, Seung-Jin, Ng, Yiu-Kai
Format: Buchkapitel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1196
container_issue
container_start_page 1182
container_title
container_volume 1861
creator Lim, Seung-Jin
Ng, Yiu-Kai
description XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document since these elements are designed for the display of data exclusively, but retain the character data of each element along with the implicit hierarchy among the data. The proposed conversion approach extracts the data hierarchy of HTML documents as closely as possible with no human intervention. The approach can be adopted to construct the data hierarchy of an HTML document and to collect data in HTML documents into an XML repository.
doi_str_mv 10.1007/3-540-44957-4_79
format Book Chapter
fullrecord <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_1381473</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC3071603_85_1201</sourcerecordid><originalsourceid>FETCH-LOGICAL-p268t-6bc8c015b44fcf31b33bbd198ff0e1fd8f4d39270385d561c1f35aad3b1c36863</originalsourceid><addsrcrecordid>eNpNkDtPwzAQx81TVNCdMQNrii_n-OyxKo8iFbGAxGY5jg2BkgQ7ReLbkwJC3HLS_3HS_Rg7BT4Dzukc81LwXAhdUi4M6R021aRwFL812mUTkAA5otB7f54k0sT32YQjL3JNAg_ZhKQmgoLoiE1TeuHjYFGMiQlbzLOl38QmDY3L5n0fO-ues9DFbNG1Hz4OTfuULe9vV9lF5zZvvh1SNnTZ43_hhB0Eu05--ruP2cPV5f1ima_urm8W81XeF1INuaycchzKSojgAkKFWFU1aBUC9xBqFUSNuiCOqqxLCQ4CltbWWIFDqSQes7Ofu71Nzq5DtK1rkulj82bjpwFUIAjH2OwnlkanffLRVF33mgxwswVr0IyczDdEswU7FsTv3di9b3wajN823PhbtGv3bPvBx2SQE0iORpUGCg74BfpHdIo</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype><pqid>EBC3071603_85_1201</pqid></control><display><type>book_chapter</type><title>A Heuristic Approach for Converting HTML Documents to XML Documents</title><source>Springer Books</source><creator>Lim, Seung-Jin ; Ng, Yiu-Kai</creator><contributor>Pereira, Luis M ; Stuckey, Peter J ; Lau, Kung-Kiu ; Palamidessi, Catuscia ; Kerber, Manfred ; Lloyd, John ; Sagiv, Yehoshua ; Furbach, Ulrich ; Dahl, Veronica ; Lloyd, John ; Furbach, Ulrich ; Pereira, Luís Moniz ; Kerber, Manfred ; Palamidessi, Catuscia ; Dahl, Veronica ; Lau, Kung-Kiu ; Sagiv, Yehoshua ; Stuckey, Peter J.</contributor><creatorcontrib>Lim, Seung-Jin ; Ng, Yiu-Kai ; Pereira, Luis M ; Stuckey, Peter J ; Lau, Kung-Kiu ; Palamidessi, Catuscia ; Kerber, Manfred ; Lloyd, John ; Sagiv, Yehoshua ; Furbach, Ulrich ; Dahl, Veronica ; Lloyd, John ; Furbach, Ulrich ; Pereira, Luís Moniz ; Kerber, Manfred ; Palamidessi, Catuscia ; Dahl, Veronica ; Lau, Kung-Kiu ; Sagiv, Yehoshua ; Stuckey, Peter J.</creatorcontrib><description>XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document since these elements are designed for the display of data exclusively, but retain the character data of each element along with the implicit hierarchy among the data. The proposed conversion approach extracts the data hierarchy of HTML documents as closely as possible with no human intervention. The approach can be adopted to construct the data hierarchy of an HTML document and to collect data in HTML documents into an XML repository.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783540677970</identifier><identifier>ISBN: 3540677976</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 9783540449577</identifier><identifier>EISBN: 3540449574</identifier><identifier>DOI: 10.1007/3-540-44957-4_79</identifier><identifier>OCLC: 769771277</identifier><identifier>LCCallNum: Q334-342</identifier><language>eng</language><publisher>Germany: Springer Berlin / Heidelberg</publisher><subject>Applied sciences ; Artificial intelligence ; Computer science; control theory; systems ; Data Content ; Empty Element ; Empty String ; Exact sciences and technology ; Heuristic Approach ; Information systems. Data bases ; Leaf Node ; Learning and adaptive systems ; Memory organisation. Data processing ; Software</subject><ispartof>Computational Logic -- CL 2000, 2000, Vol.1861, p.1182-1196</ispartof><rights>Springer-Verlag Berlin Heidelberg 2000</rights><rights>2000 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/3071603-l.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/3-540-44957-4_79$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/3-540-44957-4_79$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,779,780,784,789,790,793,4048,4049,27924,38254,41441,42510</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=1381473$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Pereira, Luis M</contributor><contributor>Stuckey, Peter J</contributor><contributor>Lau, Kung-Kiu</contributor><contributor>Palamidessi, Catuscia</contributor><contributor>Kerber, Manfred</contributor><contributor>Lloyd, John</contributor><contributor>Sagiv, Yehoshua</contributor><contributor>Furbach, Ulrich</contributor><contributor>Dahl, Veronica</contributor><contributor>Lloyd, John</contributor><contributor>Furbach, Ulrich</contributor><contributor>Pereira, Luís Moniz</contributor><contributor>Kerber, Manfred</contributor><contributor>Palamidessi, Catuscia</contributor><contributor>Dahl, Veronica</contributor><contributor>Lau, Kung-Kiu</contributor><contributor>Sagiv, Yehoshua</contributor><contributor>Stuckey, Peter J.</contributor><creatorcontrib>Lim, Seung-Jin</creatorcontrib><creatorcontrib>Ng, Yiu-Kai</creatorcontrib><title>A Heuristic Approach for Converting HTML Documents to XML Documents</title><title>Computational Logic -- CL 2000</title><description>XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document since these elements are designed for the display of data exclusively, but retain the character data of each element along with the implicit hierarchy among the data. The proposed conversion approach extracts the data hierarchy of HTML documents as closely as possible with no human intervention. The approach can be adopted to construct the data hierarchy of an HTML document and to collect data in HTML documents into an XML repository.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Computer science; control theory; systems</subject><subject>Data Content</subject><subject>Empty Element</subject><subject>Empty String</subject><subject>Exact sciences and technology</subject><subject>Heuristic Approach</subject><subject>Information systems. Data bases</subject><subject>Leaf Node</subject><subject>Learning and adaptive systems</subject><subject>Memory organisation. Data processing</subject><subject>Software</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783540677970</isbn><isbn>3540677976</isbn><isbn>9783540449577</isbn><isbn>3540449574</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2000</creationdate><recordtype>book_chapter</recordtype><recordid>eNpNkDtPwzAQx81TVNCdMQNrii_n-OyxKo8iFbGAxGY5jg2BkgQ7ReLbkwJC3HLS_3HS_Rg7BT4Dzukc81LwXAhdUi4M6R021aRwFL812mUTkAA5otB7f54k0sT32YQjL3JNAg_ZhKQmgoLoiE1TeuHjYFGMiQlbzLOl38QmDY3L5n0fO-ues9DFbNG1Hz4OTfuULe9vV9lF5zZvvh1SNnTZ43_hhB0Eu05--ruP2cPV5f1ima_urm8W81XeF1INuaycchzKSojgAkKFWFU1aBUC9xBqFUSNuiCOqqxLCQ4CltbWWIFDqSQes7Ofu71Nzq5DtK1rkulj82bjpwFUIAjH2OwnlkanffLRVF33mgxwswVr0IyczDdEswU7FsTv3di9b3wajN823PhbtGv3bPvBx2SQE0iORpUGCg74BfpHdIo</recordid><startdate>2000</startdate><enddate>2000</enddate><creator>Lim, Seung-Jin</creator><creator>Ng, Yiu-Kai</creator><general>Springer Berlin / Heidelberg</general><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>FFUUA</scope><scope>IQODW</scope></search><sort><creationdate>2000</creationdate><title>A Heuristic Approach for Converting HTML Documents to XML Documents</title><author>Lim, Seung-Jin ; Ng, Yiu-Kai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p268t-6bc8c015b44fcf31b33bbd198ff0e1fd8f4d39270385d561c1f35aad3b1c36863</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2000</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Computer science; control theory; systems</topic><topic>Data Content</topic><topic>Empty Element</topic><topic>Empty String</topic><topic>Exact sciences and technology</topic><topic>Heuristic Approach</topic><topic>Information systems. Data bases</topic><topic>Leaf Node</topic><topic>Learning and adaptive systems</topic><topic>Memory organisation. Data processing</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lim, Seung-Jin</creatorcontrib><creatorcontrib>Ng, Yiu-Kai</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lim, Seung-Jin</au><au>Ng, Yiu-Kai</au><au>Pereira, Luis M</au><au>Stuckey, Peter J</au><au>Lau, Kung-Kiu</au><au>Palamidessi, Catuscia</au><au>Kerber, Manfred</au><au>Lloyd, John</au><au>Sagiv, Yehoshua</au><au>Furbach, Ulrich</au><au>Dahl, Veronica</au><au>Lloyd, John</au><au>Furbach, Ulrich</au><au>Pereira, Luís Moniz</au><au>Kerber, Manfred</au><au>Palamidessi, Catuscia</au><au>Dahl, Veronica</au><au>Lau, Kung-Kiu</au><au>Sagiv, Yehoshua</au><au>Stuckey, Peter J.</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>A Heuristic Approach for Converting HTML Documents to XML Documents</atitle><btitle>Computational Logic -- CL 2000</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2000</date><risdate>2000</risdate><volume>1861</volume><spage>1182</spage><epage>1196</epage><pages>1182-1196</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783540677970</isbn><isbn>3540677976</isbn><eisbn>9783540449577</eisbn><eisbn>3540449574</eisbn><abstract>XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document since these elements are designed for the display of data exclusively, but retain the character data of each element along with the implicit hierarchy among the data. The proposed conversion approach extracts the data hierarchy of HTML documents as closely as possible with no human intervention. The approach can be adopted to construct the data hierarchy of an HTML document and to collect data in HTML documents into an XML repository.</abstract><cop>Germany</cop><pub>Springer Berlin / Heidelberg</pub><doi>10.1007/3-540-44957-4_79</doi><oclcid>769771277</oclcid><tpages>15</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0302-9743
ispartof Computational Logic -- CL 2000, 2000, Vol.1861, p.1182-1196
issn 0302-9743
1611-3349
language eng
recordid cdi_pascalfrancis_primary_1381473
source Springer Books
subjects Applied sciences
Artificial intelligence
Computer science
control theory
systems
Data Content
Empty Element
Empty String
Exact sciences and technology
Heuristic Approach
Information systems. Data bases
Leaf Node
Learning and adaptive systems
Memory organisation. Data processing
Software
title A Heuristic Approach for Converting HTML Documents to XML Documents
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T02%3A06%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=A%20Heuristic%20Approach%20for%20Converting%20HTML%20Documents%20to%20XML%20Documents&rft.btitle=Computational%20Logic%20--%20CL%202000&rft.au=Lim,%20Seung-Jin&rft.date=2000&rft.volume=1861&rft.spage=1182&rft.epage=1196&rft.pages=1182-1196&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783540677970&rft.isbn_list=3540677976&rft_id=info:doi/10.1007/3-540-44957-4_79&rft_dat=%3Cproquest_pasca%3EEBC3071603_85_1201%3C/proquest_pasca%3E%3Curl%3E%3C/url%3E&rft.eisbn=9783540449577&rft.eisbn_list=3540449574&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC3071603_85_1201&rft_id=info:pmid/&rfr_iscdi=true