An Efficient Feature Selection Using Hidden Topic in Text Categorization

Text categorization is an important research area in information retrieval. In order to save the storage space and get better accuracy, efficient and effective feature selection methods for reducing the data before analysis are highly desired. Usually, researches on feature selection use only a prop...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhiwei Zhang, Xuan-Hieu Phan, Horiguchi, S.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Data analysis Entropy feature selection Filters Gain measurement Information retrieval Linear discriminant analysis Machine learning algorithms Sampling methods Text categorization Vocabulary
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Text categorization is an important research area in information retrieval. In order to save the storage space and get better accuracy, efficient and effective feature selection methods for reducing the data before analysis are highly desired. Usually, researches on feature selection use only a proper measurement such as information gain. In this paper, we propose a new feature selection method by adopting an attractive hidden topic analysis and entropy-based feature ranking. Experiments dealing with the well-known Reuters-21578 and Ohsumed datasets show that our method can achieve a better classification accuracy while reducing the feature dimension dramatically.
DOI:	10.1109/WAINA.2008.137