Random forest classifier for multi-category classification of web pages

Web page classification is the automated assigning of predefined subject category to the document. Automatic Web page classification is one of the most essential techniques for Web mining given that the Web is a huge repository of various information including images, videos etc. And there is a need...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Win Thanda Aung, Khin Hay Mar Saw Hla
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Web page classification is the automated assigning of predefined subject category to the document. Automatic Web page classification is one of the most essential techniques for Web mining given that the Web is a huge repository of various information including images, videos etc. And there is a need for categorization Web pages to satisfy user needs. The classification of Web pages into each category exclusively relies on man power which cost much time and effort. To alleviate this manually classification problem, more researchers focus on the issue of Web pages classification technology. In this paper, we proposed Random Forest Classifier (RF) based on random forest method for multi-category Web page classification. The proposed RF classifier can classify Web pages efficiently according to their corresponding class without using other feature selection methods. We compared the accuracy of the proposed approach to decision tree classifier using in the same Yahoo Web pages. The experiments have shown that the proposed approach is suitable for the multi-category Web page classification.
DOI:10.1109/APSCC.2009.5394100