A Heuristic Approach for Converting HTML Documents to XML Documents

XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lim, Seung-Jin, Ng, Yiu-Kai
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Applied sciences Artificial intelligence Computer science control theory systems Data Content Empty Element Empty String Exact sciences and technology Heuristic Approach Information systems. Data bases Leaf Node Learning and adaptive systems Memory organisation. Data processing Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1196
container_issue
container_start_page	1182
container_title
container_volume	1861
creator	Lim, Seung-Jin Ng, Yiu-Kai
description	XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document since these elements are designed for the display of data exclusively, but retain the character data of each element along with the implicit hierarchy among the data. The proposed conversion approach extracts the data hierarchy of HTML documents as closely as possible with no human intervention. The approach can be adopted to construct the data hierarchy of an HTML document and to collect data in HTML documents into an XML repository.
doi_str_mv	10.1007/3-540-44957-4_79
format	Book Chapter
fullrecord	<record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_1381473</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC3071603_85_1201</sourcerecordid><originalsourceid>FETCH-LOGICAL-p268t-6bc8c015b44fcf31b33bbd198ff0e1fd8f4d39270385d561c1f35aad3b1c36863</originalsourceid><addsrcrecordid>eNpNkDtPwzAQx81TVNCdMQNrii_n-OyxKo8iFbGAxGY5jg2BkgQ7ReLbkwJC3HLS_3HS_Rg7BT4Dzukc81LwXAhdUi4M6R021aRwFL812mUTkAA5otB7f54k0sT32YQjL3JNAg_ZhKQmgoLoiE1TeuHjYFGMiQlbzLOl38QmDY3L5n0fO-ues9DFbNG1Hz4OTfuULe9vV9lF5zZvvh1SNnTZ43_hhB0Eu05--ruP2cPV5f1ima_urm8W81XeF1INuaycchzKSojgAkKFWFU1aBUC9xBqFUSNuiCOqqxLCQ4CltbWWIFDqSQes7Ofu71Nzq5DtK1rkulj82bjpwFUIAjH2OwnlkanffLRVF33mgxwswVr0IyczDdEswU7FsTv3di9b3wajN823PhbtGv3bPvBx2SQE0iORpUGCg74BfpHdIo</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype><pqid>EBC3071603_85_1201</pqid></control><display><type>book_chapter</type><title>A Heuristic Approach for Converting HTML Documents to XML Documents</title><source>Springer Books</source><creator>Lim, Seung-Jin ; Ng, Yiu-Kai</creator><contributor>Pereira, Luis M ; Stuckey, Peter J ; Lau, Kung-Kiu ; Palamidessi, Catuscia ; Kerber, Manfred ; Lloyd, John ; Sagiv, Yehoshua ; Furbach, Ulrich ; Dahl, Veronica ; Lloyd, John ; Furbach, Ulrich ; Pereira, Luís Moniz ; Kerber, Manfred ; Palamidessi, Catuscia ; Dahl, Veronica ; Lau, Kung-Kiu ; Sagiv, Yehoshua ; Stuckey, Peter J.</contributor><creatorcontrib>Lim, Seung-Jin ; Ng, Yiu-Kai ; Pereira, Luis M ; Stuckey, Peter J ; Lau, Kung-Kiu ; Palamidessi, Catuscia ; Kerber, Manfred ; Lloyd, John ; Sagiv, Yehoshua ; Furbach, Ulrich ; Dahl, Veronica ; Lloyd, John ; Furbach, Ulrich ; Pereira, Luís Moniz ; Kerber, Manfred ; Palamidessi, Catuscia ; Dahl, Veronica ; Lau, Kung-Kiu ; Sagiv, Yehoshua ; Stuckey, Peter J.</creatorcontrib><description>XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document since these elements are designed for the display of data exclusively, but retain the character data of each element along with the implicit hierarchy among the data. The proposed conversion approach extracts the data hierarchy of HTML documents as closely as possible with no human intervention. The approach can be adopted to construct the data hierarchy of an HTML document and to collect data in HTML documents into an XML repository.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783540677970</identifier><identifier>ISBN: 3540677976</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 9783540449577</identifier><identifier>EISBN: 3540449574</identifier><identifier>DOI: 10.1007/3-540-44957-4_79</identifier><identifier>OCLC: 769771277</identifier><identifier>LCCallNum: Q334-342</identifier><language>eng</language><publisher>Germany: Springer Berlin / Heidelberg</publisher><subject>Applied sciences ; Artificial intelligence ; Computer science; control theory; systems ; Data Content ; Empty Element ; Empty String ; Exact sciences and technology ; Heuristic Approach ; Information systems. Data bases ; Leaf Node ; Learning and adaptive systems ; Memory organisation. Data processing ; Software</subject><ispartof>Computational Logic -- CL 2000, 2000, Vol.1861, p.1182-1196</ispartof><rights>Springer-Verlag Berlin Heidelberg 2000</rights><rights>2000 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/3071603-l.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/3-540-44957-4_79$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/3-540-44957-4_79$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,779,780,784,789,790,793,4048,4049,27924,38254,41441,42510</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=1381473$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Pereira, Luis M</contributor><contributor>Stuckey, Peter J</contributor><contributor>Lau, Kung-Kiu</contributor><contributor>Palamidessi, Catuscia</contributor><contributor>Kerber, Manfred</contributor><contributor>Lloyd, John</contributor><contributor>Sagiv, Yehoshua</contributor><contributor>Furbach, Ulrich</contributor><contributor>Dahl, Veronica</contributor><contributor>Lloyd, John</contributor><contributor>Furbach, Ulrich</contributor><contributor>Pereira, Luís Moniz</contributor><contributor>Kerber, Manfred</contributor><contributor>Palamidessi, Catuscia</contributor><contributor>Dahl, Veronica</contributor><contributor>Lau, Kung-Kiu</contributor><contributor>Sagiv, Yehoshua</contributor><contributor>Stuckey, Peter J.</contributor><creatorcontrib>Lim, Seung-Jin</creatorcontrib><creatorcontrib>Ng, Yiu-Kai</creatorcontrib><title>A Heuristic Approach for Converting HTML Documents to XML Documents</title><title>Computational Logic -- CL 2000</title><description>XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document since these elements are designed for the display of data exclusively, but retain the character data of each element along with the implicit hierarchy among the data. The proposed conversion approach extracts the data hierarchy of HTML documents as closely as possible with no human intervention. The approach can be adopted to construct the data hierarchy of an HTML document and to collect data in HTML documents into an XML repository.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Computer science; control theory; systems</subject><subject>Data Content</subject><subject>Empty Element</subject><subject>Empty String</subject><subject>Exact sciences and technology</subject><subject>Heuristic Approach</subject><subject>Information systems. Data bases</subject><subject>Leaf Node</subject><subject>Learning and adaptive systems</subject><subject>Memory organisation. Data processing</subject><subject>Software</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783540677970</isbn><isbn>3540677976</isbn><isbn>9783540449577</isbn><isbn>3540449574</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2000</creationdate><recordtype>book_chapter</recordtype><recordid>eNpNkDtPwzAQx81TVNCdMQNrii_n-OyxKo8iFbGAxGY5jg2BkgQ7ReLbkwJC3HLS_3HS_Rg7BT4Dzukc81LwXAhdUi4M6R021aRwFL812mUTkAA5otB7f54k0sT32YQjL3JNAg_ZhKQmgoLoiE1TeuHjYFGMiQlbzLOl38QmDY3L5n0fO-ues9DFbNG1Hz4OTfuULe9vV9lF5zZvvh1SNnTZ43_hhB0Eu05--ruP2cPV5f1ima_urm8W81XeF1INuaycchzKSojgAkKFWFU1aBUC9xBqFUSNuiCOqqxLCQ4CltbWWIFDqSQes7Ofu71Nzq5DtK1rkulj82bjpwFUIAjH2OwnlkanffLRVF33mgxwswVr0IyczDdEswU7FsTv3di9b3wajN823PhbtGv3bPvBx2SQE0iORpUGCg74BfpHdIo</recordid><startdate>2000</startdate><enddate>2000</enddate><creator>Lim, Seung-Jin</creator><creator>Ng, Yiu-Kai</creator><general>Springer Berlin / Heidelberg</general><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>FFUUA</scope><scope>IQODW</scope></search><sort><creationdate>2000</creationdate><title>A Heuristic Approach for Converting HTML Documents to XML Documents</title><author>Lim, Seung-Jin ; Ng, Yiu-Kai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p268t-6bc8c015b44fcf31b33bbd198ff0e1fd8f4d39270385d561c1f35aad3b1c36863</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2000</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Computer science; control theory; systems</topic><topic>Data Content</topic><topic>Empty Element</topic><topic>Empty String</topic><topic>Exact sciences and technology</topic><topic>Heuristic Approach</topic><topic>Information systems. Data bases</topic><topic>Leaf Node</topic><topic>Learning and adaptive systems</topic><topic>Memory organisation. Data processing</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lim, Seung-Jin</creatorcontrib><creatorcontrib>Ng, Yiu-Kai</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lim, Seung-Jin</au><au>Ng, Yiu-Kai</au><au>Pereira, Luis M</au><au>Stuckey, Peter J</au><au>Lau, Kung-Kiu</au><au>Palamidessi, Catuscia</au><au>Kerber, Manfred</au><au>Lloyd, John</au><au>Sagiv, Yehoshua</au><au>Furbach, Ulrich</au><au>Dahl, Veronica</au><au>Lloyd, John</au><au>Furbach, Ulrich</au><au>Pereira, Luís Moniz</au><au>Kerber, Manfred</au><au>Palamidessi, Catuscia</au><au>Dahl, Veronica</au><au>Lau, Kung-Kiu</au><au>Sagiv, Yehoshua</au><au>Stuckey, Peter J.</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>A Heuristic Approach for Converting HTML Documents to XML Documents</atitle><btitle>Computational Logic -- CL 2000</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2000</date><risdate>2000</risdate><volume>1861</volume><spage>1182</spage><epage>1196</epage><pages>1182-1196</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783540677970</isbn><isbn>3540677976</isbn><eisbn>9783540449577</eisbn><eisbn>3540449574</eisbn><abstract>XML is rapidly emerging, and yet there still exist numerous HTML documents on the Web. In this paper, we present a heuristic approach for converting HTML documents to XML documents. During the conversion process, we eliminate all the HTML elements in an HTML document from the resulting XML document since these elements are designed for the display of data exclusively, but retain the character data of each element along with the implicit hierarchy among the data. The proposed conversion approach extracts the data hierarchy of HTML documents as closely as possible with no human intervention. The approach can be adopted to construct the data hierarchy of an HTML document and to collect data in HTML documents into an XML repository.</abstract><cop>Germany</cop><pub>Springer Berlin / Heidelberg</pub><doi>10.1007/3-540-44957-4_79</doi><oclcid>769771277</oclcid><tpages>15</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0302-9743
ispartof	Computational Logic -- CL 2000, 2000, Vol.1861, p.1182-1196
issn	0302-9743 1611-3349
language	eng
recordid	cdi_pascalfrancis_primary_1381473
source	Springer Books
subjects	Applied sciences Artificial intelligence Computer science control theory systems Data Content Empty Element Empty String Exact sciences and technology Heuristic Approach Information systems. Data bases Leaf Node Learning and adaptive systems Memory organisation. Data processing Software
title	A Heuristic Approach for Converting HTML Documents to XML Documents
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T02%3A06%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=A%20Heuristic%20Approach%20for%20Converting%20HTML%20Documents%20to%20XML%20Documents&rft.btitle=Computational%20Logic%20--%20CL%202000&rft.au=Lim,%20Seung-Jin&rft.date=2000&rft.volume=1861&rft.spage=1182&rft.epage=1196&rft.pages=1182-1196&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783540677970&rft.isbn_list=3540677976&rft_id=info:doi/10.1007/3-540-44957-4_79&rft_dat=%3Cproquest_pasca%3EEBC3071603_85_1201%3C/proquest_pasca%3E%3Curl%3E%3C/url%3E&rft.eisbn=9783540449577&rft.eisbn_list=3540449574&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC3071603_85_1201&rft_id=info:pmid/&rfr_iscdi=true