Naïve Bayes classifier for Kashmiri word sense disambiguation

Many applications of Natural Language Processing (NLP) like machine translation, document clustering, and information retrieval make use of Word Sense Disambiguation (WSD). WSD automatically predicts the sense of an ambiguous word that exactly fits it as per the given situation. While it may seem ve...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Sadhana (Bangalore) 2024-07, Vol.49 (3), Article 226
Hauptverfasser: Mir, Tawseef Ahmad, Lawaye, Aadil Ahmad
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Many applications of Natural Language Processing (NLP) like machine translation, document clustering, and information retrieval make use of Word Sense Disambiguation (WSD). WSD automatically predicts the sense of an ambiguous word that exactly fits it as per the given situation. While it may seem very easy for humans to interpret the meaning of natural language, machines require the processing of huge amounts of data for similar tasks. In this paper, we propose an automatic WSD system for the Kashmiri language based on the Naive Bayes classifier. This work is the first attempt towards developing a WSD system for the Kashmiri language to the best of our knowledge. Bag-of-Words (BoW) and Part-of-Speech (PoS) based features are used in this study for developing the WSD system. Experiments are carried out on a manually crafted sense-tagged dataset for 60 ambiguous Kashmiri words. These 60 words are selected based on the frequency in the raw corpus collected. Senses for annotation purposes of these ambiguous words are extracted from Kashmiri WordNet. The performance of the proposed system is measured using accuracy, precision, recall and F-1 measure metrics. The proposed WSD model reported the best performance (accuracy = 89.92, precision = 0.84, recall = 0.89, F-1 measure = 0.86) when both PoS and BoW features were used at the same time.
ISSN:0973-7677
0256-2499
0973-7677
DOI:10.1007/s12046-024-02551-7