A global-ranking local feature selection method for text categorization

► We propose a filtering method for text categorization called ALOFT. ► The proposed approach automatically finds the optimal number of features. ► ALOFT ensures that each document contributes to the final feature vector. ► ALOFT is fast and deterministic. ► When compared with the VR algorithm, ALOF...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2012-12, Vol.39 (17), p.12851-12857
Hauptverfasser: Pinheiro, Roberto H.W., Cavalcanti, George D.C., Correa, Renato F., Ren, Tsang Ing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:► We propose a filtering method for text categorization called ALOFT. ► The proposed approach automatically finds the optimal number of features. ► ALOFT ensures that each document contributes to the final feature vector. ► ALOFT is fast and deterministic. ► When compared with the VR algorithm, ALOFT obtains better results. In this paper, we propose a filtering method for feature selection called ALOFT (At Least One FeaTure). The proposed method focuses on specific characteristics of text categorization domain. Also, it ensures that every document in the training set is represented by at least one feature and the number of selected features is determined in a data-driven way. We compare the effectiveness of the proposed method with the Variable Ranking method using three text categorization benchmarks (Reuters-21578, 20 Newsgroup and WebKB), two different classifiers (k-Nearest Neighbor and Naïve Bayes) and five feature evaluation functions. The experiments show that ALOFT obtains equivalent or better results than the classical Variable Ranking.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2012.05.008