SM03: Evaluation of Feature Selection and Weighting methods for topical Website Multi-class Classification
The repository is related to a website classification research, named: "Evaluation of Feature Selection and Weighting methods for topical Website Multi-class Classification" The main focus of the study is a comprehensive evaluation of state-of-the-art term weighting models, in the context...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The repository is related to a website classification research, named: "Evaluation of Feature Selection and Weighting methods for topical Website Multi-class Classification" The main focus of the study is a comprehensive evaluation of state-of-the-art term weighting models, in the context of business website classification. The models are decomposed into their local and global components and recombined into 32 hybrid models, representing all viable variations, beyond what was initially considered by the original authors. The results showed that multi-class classification performances can be significantly improved if recently proposed global weighting components of Inverse Gravity Moment and Inverse Class Space Density Frequency, are combined with less addressed, but highly effective, local functions, like square root Term Frequency and Glasgow. In addition, filter-model feature selection functions, based on information theory, are empirically evaluated together with web page selection functions for website representation construction. The repository provides: + content analysis and other statistics on used datasets: WebKB's 7-Sector 1997 and WebKB 7-Sector 2018 Reports generated during three stages of experiments: + Feature selection function evaluation + 32 hybrid term weighting models evaluation + Weg page selection functions evaluation Note: the content snippets are removed from the experiment reports, in order to comply to the copyrights of source websites. Hence many folders in the reports remained empty. An experiment report directory, normally contains the following: + Subdirectories for each fold of cross validation 5-fold[0-5] directory_readme.txt -- description of contained files dt_test_results.xlsx -- classification results, after aggregated from k-folds log.txt -- Log output generated by imbWBI Console Tool note.txt -- Notes on the experiment In fold subdirectories: + Corpus -- subdirectory, contains reports of selected features and processed corpus note.txt -- provides description of the experiment setup |
---|---|
DOI: | 10.17632/zzmp7t8msn.1 |