Towards Automatic and Optimal Filtering Levels for Feature Selection in Text Categorization

Text Categorization (TC) is an important issue within Information Retrieval (IR). Feature Selection (FS) becomes a crucial task, because of the presence of irrelevant features causing a loss in the performance. FS is usually performed selecting the features with highest score according to certain me...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Montañés, E., Combarro, E. F., Díaz, I., Ranilla, J.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Applied sciences Artificial intelligence Computer science control theory systems Data processing. List processing. Character string processing Exact sciences and technology Feature Selection Feature Subset Information Gain Information Retrieval Information systems. Data bases Memory organisation. Data processing Software Speech and sound recognition and synthesis. Linguistics Target Function
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Text Categorization (TC) is an important issue within Information Retrieval (IR). Feature Selection (FS) becomes a crucial task, because of the presence of irrelevant features causing a loss in the performance. FS is usually performed selecting the features with highest score according to certain measures. However, the disadvantage of these approaches is that they need to determine in advance the number of features that are selected, commonly defined by the percentage of words removed, which is called Filtering Level (FL). In view of that, it is usual to carry out a set of experiments manually taking several FLs representing all possible ones. This process does not guarantee that any of the FLs chosen are the optimal ones, even not an approximation. This paper deals with overcoming this difficulty proposing a method that automatically determines optimal FLs by means of solving a univariate maximization problem.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/11552253_22