An alternative framework for univariate filter based feature selection for text categorization
• Introduction of an alternative framework for feature selection.• Results in a subset of highly relevant features covering all classes uniformly.• Class performance driven inclusion of additional features.• Automatic elimination of redundancy in the final subset of features.• Extensive experimentat...
Gespeichert in:
Veröffentlicht in: | Pattern recognition letters 2018-02, Vol.103, p.23-31 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | • Introduction of an alternative framework for feature selection.• Results in a subset of highly relevant features covering all classes uniformly.• Class performance driven inclusion of additional features.• Automatic elimination of redundancy in the final subset of features.• Extensive experimentation and comparison with the conventional counterparts.
In this paper, we introduce an alternative framework for selecting a most relevant subset of the original set of features for the purpose of text categorization. Given a feature set and a local feature evaluation function (such as chi-square measure, mutual information etc.,) the proposed framework ranks the features in groups instead of ranking individual features. A group of features with rth rank is more powerful than the group of features with (r+1)th rank. Each group is made up of a subset of features which are supposed to be capable of discriminating every class from every other class. The added advantage of the proposed framework is that it automatically eliminates the redundant features while selecting features without requirement of study of features in combination. Further the proposed framework also helps in handling overlapping classes effectively through selection of low ranked yet powerful features. An extensive experimentation has been conducted on three benchmarking datasets using four different local feature evaluation functions with Support Vector Machine and Naïve Bayes classifiers to bring out the effectiveness of the proposed framework over the respective conventional counterparts.
[Display omitted] |
---|---|
ISSN: | 0167-8655 1872-7344 |
DOI: | 10.1016/j.patrec.2017.12.025 |