An Experimental Approach to Detect Similar Web Pages Based on 3-Levels of Similarity Clues

It is hard to maintain web applications due to rapid changes and the proliferation of various techniques applied to web applications. Several approaches, such as clustering or refactoring web applications, have been suggested to improve their maintainability. The similarity measure is one of the pri...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Information Science and Engineering 2011-11, Vol.27 (6), p.1787-1822
Hauptverfasser: 鄭羽盛(Woo-Sung Jung), 李銀珠(Eun-Joo Lee), 禹治水(Chi-Su Wu)
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1822
container_issue 6
container_start_page 1787
container_title Journal of Information Science and Engineering
container_volume 27
creator 鄭羽盛(Woo-Sung Jung)
李銀珠(Eun-Joo Lee)
禹治水(Chi-Su Wu)
description It is hard to maintain web applications due to rapid changes and the proliferation of various techniques applied to web applications. Several approaches, such as clustering or refactoring web applications, have been suggested to improve their maintainability. The similarity measure is one of the principal criteria in these approaches. Existing studies on web similarity focused on semantic or context similarity. Most of the existing clone detection techniques concentrated on general applications, not web applications. In this paper, WSIM has been suggested to measure similarity in web applications, based on the usage degree of clues and two linking directions. The similarity clues include page relations, source and target entities, and parameters. WSIM can be classified in three levels and two directions. Six kinds of WSIMs are defined, and each WSIM has its own purpose. Finally, several experiments were conducted on simulated data and real open sources to validate the proposed WSIM.
doi_str_mv 10.6688/JISE.2011.27.6.1
format Article
fullrecord <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_proquest_miscellaneous_1009818806</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><airiti_id>10162364_201111_201303130002_201303130002_1787_1822</airiti_id><sourcerecordid>1009818806</sourcerecordid><originalsourceid>FETCH-LOGICAL-a285t-7b0e8542b6c9b43e58ce76f86d819f21798e6599cb485da6bb8027a6f0c0d3663</originalsourceid><addsrcrecordid>eNpVkD1PwzAQhj2ARPnYGb0gsST4I7GdsZQCrSqBVBASi-U4F3DlJiVOEf33OGoZGE6nkx7d3fsgdElJKoRSN_PZcpoyQmnKZCpSeoRGlFCRMC6yE3QawooQJvIsG6H3cYOnPxvo3Bqa3ng83my61thP3Lf4DnqwPV66tfOmw29Q4mfzAQHfmgAVbhvMkwV8gw-4rf8w1-_wxG8hnKPj2vgAF4d-hl7vpy-Tx2Tx9DCbjBeJYSrvE1kSUHnGSmGLMuOQKwtS1EpUihY1o7JQIPKisGWm8sqIslSESSNqYknFheBn6Hq_Nz7-Fe_2eu2CBe9NA-02aEpIoahSZECvDqgJ1vi6M411QW9ietPtNMukYpLJyM33nHExj9Ordts1MYMeNA4W9WCX0qFxwmNFof8HKpXUVDHGfwEf63PA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1009818806</pqid></control><display><type>article</type><title>An Experimental Approach to Detect Similar Web Pages Based on 3-Levels of Similarity Clues</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>鄭羽盛(Woo-Sung Jung) ; 李銀珠(Eun-Joo Lee) ; 禹治水(Chi-Su Wu)</creator><creatorcontrib>鄭羽盛(Woo-Sung Jung) ; 李銀珠(Eun-Joo Lee) ; 禹治水(Chi-Su Wu)</creatorcontrib><description>It is hard to maintain web applications due to rapid changes and the proliferation of various techniques applied to web applications. Several approaches, such as clustering or refactoring web applications, have been suggested to improve their maintainability. The similarity measure is one of the principal criteria in these approaches. Existing studies on web similarity focused on semantic or context similarity. Most of the existing clone detection techniques concentrated on general applications, not web applications. In this paper, WSIM has been suggested to measure similarity in web applications, based on the usage degree of clues and two linking directions. The similarity clues include page relations, source and target entities, and parameters. WSIM can be classified in three levels and two directions. Six kinds of WSIMs are defined, and each WSIM has its own purpose. Finally, several experiments were conducted on simulated data and real open sources to validate the proposed WSIM.</description><identifier>ISSN: 1016-2364</identifier><identifier>DOI: 10.6688/JISE.2011.27.6.1</identifier><language>eng</language><publisher>Taipei: 社團法人中華民國計算語言學學會</publisher><subject>Applied sciences ; Clustering ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; Criteria ; Exact sciences and technology ; Joining ; Linking ; Programming languages ; Semantics ; Similarity ; Simulation ; Software ; Software engineering ; Websites</subject><ispartof>Journal of Information Science and Engineering, 2011-11, Vol.27 (6), p.1787-1822</ispartof><rights>2015 INIST-CNRS</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=24782727$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>鄭羽盛(Woo-Sung Jung)</creatorcontrib><creatorcontrib>李銀珠(Eun-Joo Lee)</creatorcontrib><creatorcontrib>禹治水(Chi-Su Wu)</creatorcontrib><title>An Experimental Approach to Detect Similar Web Pages Based on 3-Levels of Similarity Clues</title><title>Journal of Information Science and Engineering</title><description>It is hard to maintain web applications due to rapid changes and the proliferation of various techniques applied to web applications. Several approaches, such as clustering or refactoring web applications, have been suggested to improve their maintainability. The similarity measure is one of the principal criteria in these approaches. Existing studies on web similarity focused on semantic or context similarity. Most of the existing clone detection techniques concentrated on general applications, not web applications. In this paper, WSIM has been suggested to measure similarity in web applications, based on the usage degree of clues and two linking directions. The similarity clues include page relations, source and target entities, and parameters. WSIM can be classified in three levels and two directions. Six kinds of WSIMs are defined, and each WSIM has its own purpose. Finally, several experiments were conducted on simulated data and real open sources to validate the proposed WSIM.</description><subject>Applied sciences</subject><subject>Clustering</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>Criteria</subject><subject>Exact sciences and technology</subject><subject>Joining</subject><subject>Linking</subject><subject>Programming languages</subject><subject>Semantics</subject><subject>Similarity</subject><subject>Simulation</subject><subject>Software</subject><subject>Software engineering</subject><subject>Websites</subject><issn>1016-2364</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><recordid>eNpVkD1PwzAQhj2ARPnYGb0gsST4I7GdsZQCrSqBVBASi-U4F3DlJiVOEf33OGoZGE6nkx7d3fsgdElJKoRSN_PZcpoyQmnKZCpSeoRGlFCRMC6yE3QawooQJvIsG6H3cYOnPxvo3Bqa3ng83my61thP3Lf4DnqwPV66tfOmw29Q4mfzAQHfmgAVbhvMkwV8gw-4rf8w1-_wxG8hnKPj2vgAF4d-hl7vpy-Tx2Tx9DCbjBeJYSrvE1kSUHnGSmGLMuOQKwtS1EpUihY1o7JQIPKisGWm8sqIslSESSNqYknFheBn6Hq_Nz7-Fe_2eu2CBe9NA-02aEpIoahSZECvDqgJ1vi6M411QW9ietPtNMukYpLJyM33nHExj9Ordts1MYMeNA4W9WCX0qFxwmNFof8HKpXUVDHGfwEf63PA</recordid><startdate>20111101</startdate><enddate>20111101</enddate><creator>鄭羽盛(Woo-Sung Jung)</creator><creator>李銀珠(Eun-Joo Lee)</creator><creator>禹治水(Chi-Su Wu)</creator><general>社團法人中華民國計算語言學學會</general><general>Institute of Information Science, Academia sinica</general><scope>188</scope><scope>IQODW</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20111101</creationdate><title>An Experimental Approach to Detect Similar Web Pages Based on 3-Levels of Similarity Clues</title><author>鄭羽盛(Woo-Sung Jung) ; 李銀珠(Eun-Joo Lee) ; 禹治水(Chi-Su Wu)</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a285t-7b0e8542b6c9b43e58ce76f86d819f21798e6599cb485da6bb8027a6f0c0d3663</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Applied sciences</topic><topic>Clustering</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>Criteria</topic><topic>Exact sciences and technology</topic><topic>Joining</topic><topic>Linking</topic><topic>Programming languages</topic><topic>Semantics</topic><topic>Similarity</topic><topic>Simulation</topic><topic>Software</topic><topic>Software engineering</topic><topic>Websites</topic><toplevel>online_resources</toplevel><creatorcontrib>鄭羽盛(Woo-Sung Jung)</creatorcontrib><creatorcontrib>李銀珠(Eun-Joo Lee)</creatorcontrib><creatorcontrib>禹治水(Chi-Su Wu)</creatorcontrib><collection>Airiti Library</collection><collection>Pascal-Francis</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of Information Science and Engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>鄭羽盛(Woo-Sung Jung)</au><au>李銀珠(Eun-Joo Lee)</au><au>禹治水(Chi-Su Wu)</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Experimental Approach to Detect Similar Web Pages Based on 3-Levels of Similarity Clues</atitle><jtitle>Journal of Information Science and Engineering</jtitle><date>2011-11-01</date><risdate>2011</risdate><volume>27</volume><issue>6</issue><spage>1787</spage><epage>1822</epage><pages>1787-1822</pages><issn>1016-2364</issn><abstract>It is hard to maintain web applications due to rapid changes and the proliferation of various techniques applied to web applications. Several approaches, such as clustering or refactoring web applications, have been suggested to improve their maintainability. The similarity measure is one of the principal criteria in these approaches. Existing studies on web similarity focused on semantic or context similarity. Most of the existing clone detection techniques concentrated on general applications, not web applications. In this paper, WSIM has been suggested to measure similarity in web applications, based on the usage degree of clues and two linking directions. The similarity clues include page relations, source and target entities, and parameters. WSIM can be classified in three levels and two directions. Six kinds of WSIMs are defined, and each WSIM has its own purpose. Finally, several experiments were conducted on simulated data and real open sources to validate the proposed WSIM.</abstract><cop>Taipei</cop><pub>社團法人中華民國計算語言學學會</pub><doi>10.6688/JISE.2011.27.6.1</doi><tpages>36</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1016-2364
ispartof Journal of Information Science and Engineering, 2011-11, Vol.27 (6), p.1787-1822
issn 1016-2364
language eng
recordid cdi_proquest_miscellaneous_1009818806
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Applied sciences
Clustering
Computer science
control theory
systems
Computer systems and distributed systems. User interface
Criteria
Exact sciences and technology
Joining
Linking
Programming languages
Semantics
Similarity
Simulation
Software
Software engineering
Websites
title An Experimental Approach to Detect Similar Web Pages Based on 3-Levels of Similarity Clues
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T10%3A54%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Experimental%20Approach%20to%20Detect%20Similar%20Web%20Pages%20Based%20on%203-Levels%20of%20Similarity%20Clues&rft.jtitle=Journal%20of%20Information%20Science%20and%20Engineering&rft.au=%E9%84%AD%E7%BE%BD%E7%9B%9B(Woo-Sung%20Jung)&rft.date=2011-11-01&rft.volume=27&rft.issue=6&rft.spage=1787&rft.epage=1822&rft.pages=1787-1822&rft.issn=1016-2364&rft_id=info:doi/10.6688/JISE.2011.27.6.1&rft_dat=%3Cproquest_pasca%3E1009818806%3C/proquest_pasca%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1009818806&rft_id=info:pmid/&rft_airiti_id=10162364_201111_201303130002_201303130002_1787_1822&rfr_iscdi=true