Improving Documents Classification with Semantic Features

Successful text classification is highly dependent on the representations used. Currently, most approaches to text classification adopt the `bag-of-words' document representation approach, where the frequency of occurrence of each word is considered as the most important feature, but this metho...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Bai Rujiang, Liao Junhua
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Electronic mail Frequency Indexing Libraries Machine learning Ontologies ontology RDF Support vector machine classification Support vector machines SVM Text categorization text classification Vocabulary
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Successful text classification is highly dependent on the representations used. Currently, most approaches to text classification adopt the `bag-of-words' document representation approach, where the frequency of occurrence of each word is considered as the most important feature, but this method ignores important semantic relationships between key terms. In this paper, we proposed a system that uses ontologies and Natural Language Processing techniques to index texts. Traditional BOW matrix is replaced by "Bag of Concepts" (BOC). For this purpose, we developed fully automated methods for mapping keywords to their corresponding ontology concepts. Support Vector Machine a successful machine learning technique is used for classification. Experimental results shows that our proposed method dose improve text classification performance significantly.
DOI:	10.1109/ISECS.2009.231