Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features

Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Liuling Dai, Jinwu Hu, WanChun Liu
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive manner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising.
DOI:10.1109/ISCID.2008.178