Enhancing the identification of web genres by combining internal and external structures

•We propose to use the terms extracted from the internal and external structures of a web page to identify the web genre.•We propose an improved evidential combination method to combine the evidences assigned to each genre by different classifiers.•The new combination method exploits the rank of eac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition letters 2021-06, Vol.146, p.83-89
1. Verfasser: Jebari, Chaker
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We propose to use the terms extracted from the internal and external structures of a web page to identify the web genre.•We propose an improved evidential combination method to combine the evidences assigned to each genre by different classifiers.•The new combination method exploits the rank of each genre returned by each classifier to adjust its evidence.•We compared the proposed combination method with many other evidential combination methods and OWA operators as well.•We compared the proposed method with other ensemble classifiers. Automating the identification of the genre of web pages becomes a promising research area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. Many studies have been proposed to identify the genre of web pages. These studies differ with respect to three main factors which are the features used, the classification algorithm and the list of genres used for the evaluation. The main idea of this paper is to combine the predictions produced by different classifiers using the internal and external structures of a web page. To combine the predictions of the different classifiers we used different OWA operators and the Dempster-Shafer (DS) combination rule. Moreover, we proposed an improved DS combination method based on the ranks of the predicted genres. The experiments conducted using the two known datasets (KI-04 and SANTINIS), show that our study achieves better results in comparison with other ensemble classifiers and genre identification works as well.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2021.03.004