Enhancing the identification of web genres by combining internal and external structures
•We propose to use the terms extracted from the internal and external structures of a web page to identify the web genre.•We propose an improved evidential combination method to combine the evidences assigned to each genre by different classifiers.•The new combination method exploits the rank of eac...
Gespeichert in:
Veröffentlicht in: | Pattern recognition letters 2021-06, Vol.146, p.83-89 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 89 |
---|---|
container_issue | |
container_start_page | 83 |
container_title | Pattern recognition letters |
container_volume | 146 |
creator | Jebari, Chaker |
description | •We propose to use the terms extracted from the internal and external structures of a web page to identify the web genre.•We propose an improved evidential combination method to combine the evidences assigned to each genre by different classifiers.•The new combination method exploits the rank of each genre returned by each classifier to adjust its evidence.•We compared the proposed combination method with many other evidential combination methods and OWA operators as well.•We compared the proposed method with other ensemble classifiers.
Automating the identification of the genre of web pages becomes a promising research area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. Many studies have been proposed to identify the genre of web pages. These studies differ with respect to three main factors which are the features used, the classification algorithm and the list of genres used for the evaluation. The main idea of this paper is to combine the predictions produced by different classifiers using the internal and external structures of a web page. To combine the predictions of the different classifiers we used different OWA operators and the Dempster-Shafer (DS) combination rule. Moreover, we proposed an improved DS combination method based on the ranks of the predicted genres. The experiments conducted using the two known datasets (KI-04 and SANTINIS), show that our study achieves better results in comparison with other ensemble classifiers and genre identification works as well. |
doi_str_mv | 10.1016/j.patrec.2021.03.004 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2533404090</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167865521000830</els_id><sourcerecordid>2533404090</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-eaf9a75311b7bc7567791e8f73edacc6213b617f57cb95d9c400161b417ad6a83</originalsourceid><addsrcrecordid>eNp9kF9LwzAUxYMoOKffwIeAz61JkzbtiyBj_oGBLwq-hSS93VK2dCapum9vyvbs0-XC75x7z0HolpKcElrd9_leRQ8mL0hBc8JyQvgZmtFaFJlgnJ-jWcJEVldleYmuQugJIRVr6hn6XLqNcsa6NY4bwLYFF21njYp2cHjo8A9ovAbnIWB9wGbYaesm2roI3qktVq7F8HtaQvSjiWOir9FFp7YBbk5zjj6elu-Ll2z19vy6eFxlhjEeM1Bdo0TJKNVCG1FWQjQU6k4waJUxVUGZrqjoSmF0U7aN4SRFoZpTodpK1WyO7o6-ez98jRCi7Idx-iXIokwnCCcNSRQ_UsYPIXjo5N7bnfIHSYmcOpS9PHYopw4lYTJ1mGQPRxmkBN8WvAzGgjPQ2oRG2Q72f4M_LnZ9GQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2533404090</pqid></control><display><type>article</type><title>Enhancing the identification of web genres by combining internal and external structures</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Jebari, Chaker</creator><creatorcontrib>Jebari, Chaker</creatorcontrib><description>•We propose to use the terms extracted from the internal and external structures of a web page to identify the web genre.•We propose an improved evidential combination method to combine the evidences assigned to each genre by different classifiers.•The new combination method exploits the rank of each genre returned by each classifier to adjust its evidence.•We compared the proposed combination method with many other evidential combination methods and OWA operators as well.•We compared the proposed method with other ensemble classifiers.
Automating the identification of the genre of web pages becomes a promising research area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. Many studies have been proposed to identify the genre of web pages. These studies differ with respect to three main factors which are the features used, the classification algorithm and the list of genres used for the evaluation. The main idea of this paper is to combine the predictions produced by different classifiers using the internal and external structures of a web page. To combine the predictions of the different classifiers we used different OWA operators and the Dempster-Shafer (DS) combination rule. Moreover, we proposed an improved DS combination method based on the ranks of the predicted genres. The experiments conducted using the two known datasets (KI-04 and SANTINIS), show that our study achieves better results in comparison with other ensemble classifiers and genre identification works as well.</description><identifier>ISSN: 0167-8655</identifier><identifier>EISSN: 1872-7344</identifier><identifier>DOI: 10.1016/j.patrec.2021.03.004</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Classification ; Classifiers ; Combination ; Dempster Shafer theory ; OWA operators ; Web genre identification ; Websites</subject><ispartof>Pattern recognition letters, 2021-06, Vol.146, p.83-89</ispartof><rights>2021 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Jun 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-eaf9a75311b7bc7567791e8f73edacc6213b617f57cb95d9c400161b417ad6a83</citedby><cites>FETCH-LOGICAL-c334t-eaf9a75311b7bc7567791e8f73edacc6213b617f57cb95d9c400161b417ad6a83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.patrec.2021.03.004$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Jebari, Chaker</creatorcontrib><title>Enhancing the identification of web genres by combining internal and external structures</title><title>Pattern recognition letters</title><description>•We propose to use the terms extracted from the internal and external structures of a web page to identify the web genre.•We propose an improved evidential combination method to combine the evidences assigned to each genre by different classifiers.•The new combination method exploits the rank of each genre returned by each classifier to adjust its evidence.•We compared the proposed combination method with many other evidential combination methods and OWA operators as well.•We compared the proposed method with other ensemble classifiers.
Automating the identification of the genre of web pages becomes a promising research area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. Many studies have been proposed to identify the genre of web pages. These studies differ with respect to three main factors which are the features used, the classification algorithm and the list of genres used for the evaluation. The main idea of this paper is to combine the predictions produced by different classifiers using the internal and external structures of a web page. To combine the predictions of the different classifiers we used different OWA operators and the Dempster-Shafer (DS) combination rule. Moreover, we proposed an improved DS combination method based on the ranks of the predicted genres. The experiments conducted using the two known datasets (KI-04 and SANTINIS), show that our study achieves better results in comparison with other ensemble classifiers and genre identification works as well.</description><subject>Algorithms</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Combination</subject><subject>Dempster Shafer theory</subject><subject>OWA operators</subject><subject>Web genre identification</subject><subject>Websites</subject><issn>0167-8655</issn><issn>1872-7344</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kF9LwzAUxYMoOKffwIeAz61JkzbtiyBj_oGBLwq-hSS93VK2dCapum9vyvbs0-XC75x7z0HolpKcElrd9_leRQ8mL0hBc8JyQvgZmtFaFJlgnJ-jWcJEVldleYmuQugJIRVr6hn6XLqNcsa6NY4bwLYFF21njYp2cHjo8A9ovAbnIWB9wGbYaesm2roI3qktVq7F8HtaQvSjiWOir9FFp7YBbk5zjj6elu-Ll2z19vy6eFxlhjEeM1Bdo0TJKNVCG1FWQjQU6k4waJUxVUGZrqjoSmF0U7aN4SRFoZpTodpK1WyO7o6-ez98jRCi7Idx-iXIokwnCCcNSRQ_UsYPIXjo5N7bnfIHSYmcOpS9PHYopw4lYTJ1mGQPRxmkBN8WvAzGgjPQ2oRG2Q72f4M_LnZ9GQ</recordid><startdate>202106</startdate><enddate>202106</enddate><creator>Jebari, Chaker</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TK</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>202106</creationdate><title>Enhancing the identification of web genres by combining internal and external structures</title><author>Jebari, Chaker</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-eaf9a75311b7bc7567791e8f73edacc6213b617f57cb95d9c400161b417ad6a83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Combination</topic><topic>Dempster Shafer theory</topic><topic>OWA operators</topic><topic>Web genre identification</topic><topic>Websites</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jebari, Chaker</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Pattern recognition letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jebari, Chaker</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing the identification of web genres by combining internal and external structures</atitle><jtitle>Pattern recognition letters</jtitle><date>2021-06</date><risdate>2021</risdate><volume>146</volume><spage>83</spage><epage>89</epage><pages>83-89</pages><issn>0167-8655</issn><eissn>1872-7344</eissn><abstract>•We propose to use the terms extracted from the internal and external structures of a web page to identify the web genre.•We propose an improved evidential combination method to combine the evidences assigned to each genre by different classifiers.•The new combination method exploits the rank of each genre returned by each classifier to adjust its evidence.•We compared the proposed combination method with many other evidential combination methods and OWA operators as well.•We compared the proposed method with other ensemble classifiers.
Automating the identification of the genre of web pages becomes a promising research area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. Many studies have been proposed to identify the genre of web pages. These studies differ with respect to three main factors which are the features used, the classification algorithm and the list of genres used for the evaluation. The main idea of this paper is to combine the predictions produced by different classifiers using the internal and external structures of a web page. To combine the predictions of the different classifiers we used different OWA operators and the Dempster-Shafer (DS) combination rule. Moreover, we proposed an improved DS combination method based on the ranks of the predicted genres. The experiments conducted using the two known datasets (KI-04 and SANTINIS), show that our study achieves better results in comparison with other ensemble classifiers and genre identification works as well.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.patrec.2021.03.004</doi><tpages>7</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0167-8655 |
ispartof | Pattern recognition letters, 2021-06, Vol.146, p.83-89 |
issn | 0167-8655 1872-7344 |
language | eng |
recordid | cdi_proquest_journals_2533404090 |
source | Elsevier ScienceDirect Journals Complete |
subjects | Algorithms Classification Classifiers Combination Dempster Shafer theory OWA operators Web genre identification Websites |
title | Enhancing the identification of web genres by combining internal and external structures |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T19%3A44%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20the%20identification%20of%20web%20genres%20by%20combining%20internal%20and%20external%20structures&rft.jtitle=Pattern%20recognition%20letters&rft.au=Jebari,%20Chaker&rft.date=2021-06&rft.volume=146&rft.spage=83&rft.epage=89&rft.pages=83-89&rft.issn=0167-8655&rft.eissn=1872-7344&rft_id=info:doi/10.1016/j.patrec.2021.03.004&rft_dat=%3Cproquest_cross%3E2533404090%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2533404090&rft_id=info:pmid/&rft_els_id=S0167865521000830&rfr_iscdi=true |