A comparison of the performance of SVM and ARNI on Text Categorization with new filtering measures on an unbalanced collection
Text Categorization (TC) is the process of assigning documents to a set of previously fixed categories. A lot of research is going on with the goal of automating this time-consuming task due to the great amount of information available. Machine Learning (ML) algorithms are methods recently applied w...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Text Categorization (TC) is the process of assigning documents to a set of previously fixed categories. A lot of research is going on with the goal of automating this time-consuming task due to the great amount of information available. Machine Learning (ML) algorithms are methods recently applied with this purpose. In this paper, we compare the performance of two of these algorithms (SVM and ARNI) on a collection with an unbalanced distribution of documents into categories. Feature reduction is previously applied with both classical measures (information gain and term frequency) and 3 new measures that we propose here for first time. We also compare their performance. |
---|---|
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/3-540-44869-1_94 |