Analysis of N-gram model on Telugu document classification
Document classification is one of the recent areas of research evolved as a result of exponential growth in the quantum electronic form of documents. Various document representation methods based on linguistic knowledge are revisited in literature. Adaptability of N-gram models on various languages...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Document classification is one of the recent areas of research evolved as a result of exponential growth in the quantum electronic form of documents. Various document representation methods based on linguistic knowledge are revisited in literature. Adaptability of N-gram models on various languages is the recent trend. In this paper an attempt is made to analyze character N-gram model on Telugu documents. Tokenization of syllables and the associated complexity of Telugu script is described. A combination of Bayes probabilistic classifier and character N-gram model is discussed in this paper. The performance of the proposed classifier is evaluated in terms of overall accuracy. |
---|---|
ISSN: | 1089-778X 1941-0026 |
DOI: | 10.1109/CEC.2008.4631231 |