Enhancing the identification of web genres by combining internal and external structures

•We propose to use the terms extracted from the internal and external structures of a web page to identify the web genre.•We propose an improved evidential combination method to combine the evidences assigned to each genre by different classifiers.•The new combination method exploits the rank of eac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition letters 2021-06, Vol.146, p.83-89
1. Verfasser: Jebari, Chaker
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 89
container_issue
container_start_page 83
container_title Pattern recognition letters
container_volume 146
creator Jebari, Chaker
description •We propose to use the terms extracted from the internal and external structures of a web page to identify the web genre.•We propose an improved evidential combination method to combine the evidences assigned to each genre by different classifiers.•The new combination method exploits the rank of each genre returned by each classifier to adjust its evidence.•We compared the proposed combination method with many other evidential combination methods and OWA operators as well.•We compared the proposed method with other ensemble classifiers. Automating the identification of the genre of web pages becomes a promising research area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. Many studies have been proposed to identify the genre of web pages. These studies differ with respect to three main factors which are the features used, the classification algorithm and the list of genres used for the evaluation. The main idea of this paper is to combine the predictions produced by different classifiers using the internal and external structures of a web page. To combine the predictions of the different classifiers we used different OWA operators and the Dempster-Shafer (DS) combination rule. Moreover, we proposed an improved DS combination method based on the ranks of the predicted genres. The experiments conducted using the two known datasets (KI-04 and SANTINIS), show that our study achieves better results in comparison with other ensemble classifiers and genre identification works as well.
doi_str_mv 10.1016/j.patrec.2021.03.004
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2533404090</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167865521000830</els_id><sourcerecordid>2533404090</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-eaf9a75311b7bc7567791e8f73edacc6213b617f57cb95d9c400161b417ad6a83</originalsourceid><addsrcrecordid>eNp9kF9LwzAUxYMoOKffwIeAz61JkzbtiyBj_oGBLwq-hSS93VK2dCapum9vyvbs0-XC75x7z0HolpKcElrd9_leRQ8mL0hBc8JyQvgZmtFaFJlgnJ-jWcJEVldleYmuQugJIRVr6hn6XLqNcsa6NY4bwLYFF21njYp2cHjo8A9ovAbnIWB9wGbYaesm2roI3qktVq7F8HtaQvSjiWOir9FFp7YBbk5zjj6elu-Ll2z19vy6eFxlhjEeM1Bdo0TJKNVCG1FWQjQU6k4waJUxVUGZrqjoSmF0U7aN4SRFoZpTodpK1WyO7o6-ez98jRCi7Idx-iXIokwnCCcNSRQ_UsYPIXjo5N7bnfIHSYmcOpS9PHYopw4lYTJ1mGQPRxmkBN8WvAzGgjPQ2oRG2Q72f4M_LnZ9GQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2533404090</pqid></control><display><type>article</type><title>Enhancing the identification of web genres by combining internal and external structures</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Jebari, Chaker</creator><creatorcontrib>Jebari, Chaker</creatorcontrib><description>•We propose to use the terms extracted from the internal and external structures of a web page to identify the web genre.•We propose an improved evidential combination method to combine the evidences assigned to each genre by different classifiers.•The new combination method exploits the rank of each genre returned by each classifier to adjust its evidence.•We compared the proposed combination method with many other evidential combination methods and OWA operators as well.•We compared the proposed method with other ensemble classifiers. Automating the identification of the genre of web pages becomes a promising research area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. Many studies have been proposed to identify the genre of web pages. These studies differ with respect to three main factors which are the features used, the classification algorithm and the list of genres used for the evaluation. The main idea of this paper is to combine the predictions produced by different classifiers using the internal and external structures of a web page. To combine the predictions of the different classifiers we used different OWA operators and the Dempster-Shafer (DS) combination rule. Moreover, we proposed an improved DS combination method based on the ranks of the predicted genres. The experiments conducted using the two known datasets (KI-04 and SANTINIS), show that our study achieves better results in comparison with other ensemble classifiers and genre identification works as well.</description><identifier>ISSN: 0167-8655</identifier><identifier>EISSN: 1872-7344</identifier><identifier>DOI: 10.1016/j.patrec.2021.03.004</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Classification ; Classifiers ; Combination ; Dempster Shafer theory ; OWA operators ; Web genre identification ; Websites</subject><ispartof>Pattern recognition letters, 2021-06, Vol.146, p.83-89</ispartof><rights>2021 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Jun 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-eaf9a75311b7bc7567791e8f73edacc6213b617f57cb95d9c400161b417ad6a83</citedby><cites>FETCH-LOGICAL-c334t-eaf9a75311b7bc7567791e8f73edacc6213b617f57cb95d9c400161b417ad6a83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.patrec.2021.03.004$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Jebari, Chaker</creatorcontrib><title>Enhancing the identification of web genres by combining internal and external structures</title><title>Pattern recognition letters</title><description>•We propose to use the terms extracted from the internal and external structures of a web page to identify the web genre.•We propose an improved evidential combination method to combine the evidences assigned to each genre by different classifiers.•The new combination method exploits the rank of each genre returned by each classifier to adjust its evidence.•We compared the proposed combination method with many other evidential combination methods and OWA operators as well.•We compared the proposed method with other ensemble classifiers. Automating the identification of the genre of web pages becomes a promising research area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. Many studies have been proposed to identify the genre of web pages. These studies differ with respect to three main factors which are the features used, the classification algorithm and the list of genres used for the evaluation. The main idea of this paper is to combine the predictions produced by different classifiers using the internal and external structures of a web page. To combine the predictions of the different classifiers we used different OWA operators and the Dempster-Shafer (DS) combination rule. Moreover, we proposed an improved DS combination method based on the ranks of the predicted genres. The experiments conducted using the two known datasets (KI-04 and SANTINIS), show that our study achieves better results in comparison with other ensemble classifiers and genre identification works as well.</description><subject>Algorithms</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Combination</subject><subject>Dempster Shafer theory</subject><subject>OWA operators</subject><subject>Web genre identification</subject><subject>Websites</subject><issn>0167-8655</issn><issn>1872-7344</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kF9LwzAUxYMoOKffwIeAz61JkzbtiyBj_oGBLwq-hSS93VK2dCapum9vyvbs0-XC75x7z0HolpKcElrd9_leRQ8mL0hBc8JyQvgZmtFaFJlgnJ-jWcJEVldleYmuQugJIRVr6hn6XLqNcsa6NY4bwLYFF21njYp2cHjo8A9ovAbnIWB9wGbYaesm2roI3qktVq7F8HtaQvSjiWOir9FFp7YBbk5zjj6elu-Ll2z19vy6eFxlhjEeM1Bdo0TJKNVCG1FWQjQU6k4waJUxVUGZrqjoSmF0U7aN4SRFoZpTodpK1WyO7o6-ez98jRCi7Idx-iXIokwnCCcNSRQ_UsYPIXjo5N7bnfIHSYmcOpS9PHYopw4lYTJ1mGQPRxmkBN8WvAzGgjPQ2oRG2Q72f4M_LnZ9GQ</recordid><startdate>202106</startdate><enddate>202106</enddate><creator>Jebari, Chaker</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TK</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>202106</creationdate><title>Enhancing the identification of web genres by combining internal and external structures</title><author>Jebari, Chaker</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-eaf9a75311b7bc7567791e8f73edacc6213b617f57cb95d9c400161b417ad6a83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Combination</topic><topic>Dempster Shafer theory</topic><topic>OWA operators</topic><topic>Web genre identification</topic><topic>Websites</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jebari, Chaker</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Pattern recognition letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jebari, Chaker</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing the identification of web genres by combining internal and external structures</atitle><jtitle>Pattern recognition letters</jtitle><date>2021-06</date><risdate>2021</risdate><volume>146</volume><spage>83</spage><epage>89</epage><pages>83-89</pages><issn>0167-8655</issn><eissn>1872-7344</eissn><abstract>•We propose to use the terms extracted from the internal and external structures of a web page to identify the web genre.•We propose an improved evidential combination method to combine the evidences assigned to each genre by different classifiers.•The new combination method exploits the rank of each genre returned by each classifier to adjust its evidence.•We compared the proposed combination method with many other evidential combination methods and OWA operators as well.•We compared the proposed method with other ensemble classifiers. Automating the identification of the genre of web pages becomes a promising research area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. Many studies have been proposed to identify the genre of web pages. These studies differ with respect to three main factors which are the features used, the classification algorithm and the list of genres used for the evaluation. The main idea of this paper is to combine the predictions produced by different classifiers using the internal and external structures of a web page. To combine the predictions of the different classifiers we used different OWA operators and the Dempster-Shafer (DS) combination rule. Moreover, we proposed an improved DS combination method based on the ranks of the predicted genres. The experiments conducted using the two known datasets (KI-04 and SANTINIS), show that our study achieves better results in comparison with other ensemble classifiers and genre identification works as well.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.patrec.2021.03.004</doi><tpages>7</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0167-8655
ispartof Pattern recognition letters, 2021-06, Vol.146, p.83-89
issn 0167-8655
1872-7344
language eng
recordid cdi_proquest_journals_2533404090
source Elsevier ScienceDirect Journals Complete
subjects Algorithms
Classification
Classifiers
Combination
Dempster Shafer theory
OWA operators
Web genre identification
Websites
title Enhancing the identification of web genres by combining internal and external structures
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T19%3A44%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20the%20identification%20of%20web%20genres%20by%20combining%20internal%20and%20external%20structures&rft.jtitle=Pattern%20recognition%20letters&rft.au=Jebari,%20Chaker&rft.date=2021-06&rft.volume=146&rft.spage=83&rft.epage=89&rft.pages=83-89&rft.issn=0167-8655&rft.eissn=1872-7344&rft_id=info:doi/10.1016/j.patrec.2021.03.004&rft_dat=%3Cproquest_cross%3E2533404090%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2533404090&rft_id=info:pmid/&rft_els_id=S0167865521000830&rfr_iscdi=true