Semantic Term Weighting Representation for Kannada Document Classification
In Natural Language Processing, the sequence order of terms plays a vital role in document categorization tasks. This positional sequence information aids in the natural language's semantic analysis. We proposed the semantic term weighting representation in response to the lack of semantic info...
Gespeichert in:
Veröffentlicht in: | Revue d'Intelligence Artificielle 2024-08, Vol.38 (4), p.1243-1253 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng ; fre |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In Natural Language Processing, the sequence order of terms plays a vital role in document categorization tasks. This positional sequence information aids in the natural language's semantic analysis. We proposed the semantic term weighting representation in response to the lack of semantic information in term weighting approaches. On the other hand, we created a collection of 11,045 Kannada documents dataset in response to the need for Indian regional language resources, particularly for the Kannada language. This dataset is asymmetrical and multilabel. The proposed dataset is subjected to the newly presented semantic term weighting representation techniques, like Term Frequency-Positional Encoding (TF-PE) and Term Frequency-Inverse Document Frequency-Positional Encoding (TF-IDF-PE). Further, the K-Fold and normal train-test split experimentations are carried out on the proposed dataset. Out of all the proposed representation techniques, Unicode encoded Term Frequency-Inverse Document Frequency-Positional Encoding (TF-IDF-PE) representation performed better than Term Frequency-Positional Encoding (TF-PE). In K-10 Fold experiments, the Unicode encoded TF-IDF-PE representation with the SVM classifier produces a greater average accuracy of 68.62%. |
---|---|
ISSN: | 0992-499X 1958-5748 |
DOI: | 10.18280/ria.380418 |