Document Classification through Building Specified N-Gram
This paper proposed a method to classify textural documents using specified n-gram data set. Human lives in the world where web documents have a great potential and the amount of valuable information has been consistently growing over the year. There is a problem that finding relevant web documents...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper proposed a method to classify textural documents using specified n-gram data set. Human lives in the world where web documents have a great potential and the amount of valuable information has been consistently growing over the year. There is a problem that finding relevant web documents corresponding to what users want is more difficult due to the huge amount of web size. For this reason, many approaches have been suggested to overcome this obstacle. The most important task is classifying textural documents into predefined categories. Over the years, many statistical approaches were introduced though, no one can find perfect solution yet. In this paper, we suggest a method for textural document classification using n-gram model. The n-gram data frequency has a great potential to find similarities between documents. For this reason, we construct our own n-gram data sets from research papers. If an unknown document comes to the system, the system will extract n-grams from the given unknown documents. After this step, n-grams from unknown document and n-grams in previous data sets will be compared by proposed similarity measurement. The precision rate of this method comes to 86%. |
---|---|
DOI: | 10.1109/IMIS.2012.142 |