An accurate word sense disambiguation system based on weighted lexical features

One of the major challenges in the process of machine translation is word sense disambiguation (WSD), which is defined as choosing the correct meaning of a multi-meaning word in a text. Supervised learning methods are usually used to solve this problem. The disambiguation task is performed using the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Literary and linguistic computing 2014-04, Vol.29 (1), p.74-88
Hauptverfasser: Rezapour, Abdoreza, Fakhrahmad, Seyed Mostafa, Sadreddini, Mohammad Hadi, Zolghadri Jahromi, Mansoor
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:One of the major challenges in the process of machine translation is word sense disambiguation (WSD), which is defined as choosing the correct meaning of a multi-meaning word in a text. Supervised learning methods are usually used to solve this problem. The disambiguation task is performed using the statistics of the translated documents (as training data) or dual corpora of source and target languages. In this article, we present a supervised learning method for WSD, which is based on K-nearest neighbor algorithm. As the first step, we extract two sets of features: the set of words that have occurred frequently in the text and the set of words surrounding the ambiguous word. In order to improve the classification accuracy, we perform a feature selection process and then propose a feature weighting strategy to tune the classifier. In order to show that the proposed schemes are not language dependent, we apply the suggested schemes to two sets of data, i.e. English and Persian corpora. The evaluation results show that the feature selection and feature weighting strategies have a significant effect on the accuracy of the classification system. The results are also encouraging compared with the state of the art.
ISSN:0268-1145
1477-4615
DOI:10.1093/llc/fqs074