An alternative framework for univariate filter based feature selection for text categorization

• Introduction of an alternative framework for feature selection.• Results in a subset of highly relevant features covering all classes uniformly.• Class performance driven inclusion of additional features.• Automatic elimination of redundancy in the final subset of features.• Extensive experimentat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition letters 2018-02, Vol.103, p.23-31
Hauptverfasser:	Guru, D.S., Suhil, Mahamad, Raju, Lavanya Narayana, Kumar, N Vinay
Format:	Artikel
Sprache:	eng
Schlagworte:	Bayesian analysis Classification Datasets Dimensionality reduction Discriminant analysis Experimentation Feature redundancy Filter based feature selection Redundancy Support vector machines Text categorization Text classification
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	• Introduction of an alternative framework for feature selection.• Results in a subset of highly relevant features covering all classes uniformly.• Class performance driven inclusion of additional features.• Automatic elimination of redundancy in the final subset of features.• Extensive experimentation and comparison with the conventional counterparts. In this paper, we introduce an alternative framework for selecting a most relevant subset of the original set of features for the purpose of text categorization. Given a feature set and a local feature evaluation function (such as chi-square measure, mutual information etc.,) the proposed framework ranks the features in groups instead of ranking individual features. A group of features with rth rank is more powerful than the group of features with (r+1)th rank. Each group is made up of a subset of features which are supposed to be capable of discriminating every class from every other class. The added advantage of the proposed framework is that it automatically eliminates the redundant features while selecting features without requirement of study of features in combination. Further the proposed framework also helps in handling overlapping classes effectively through selection of low ranked yet powerful features. An extensive experimentation has been conducted on three benchmarking datasets using four different local feature evaluation functions with Support Vector Machine and Naïve Bayes classifiers to bring out the effectiveness of the proposed framework over the respective conventional counterparts. [Display omitted]
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2017.12.025