A Novel Information Theoretic Framework for Finding Semantic Similarity in WordNet

Information content (IC) based measures for finding semantic similarity is gaining preferences day by day. Semantics of concepts can be highly characterized by information theory. The conventional way for calculating IC is based on the probability of appearance of concepts in corpora. Due to data sp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2016-07
Hauptverfasser: Adhikari, Abhijit, Singh, Shivang, Mondal, Deepjyoti, Dutta, Biswanath, Dutta, Animesh
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Information content (IC) based measures for finding semantic similarity is gaining preferences day by day. Semantics of concepts can be highly characterized by information theory. The conventional way for calculating IC is based on the probability of appearance of concepts in corpora. Due to data sparseness and corpora dependency issues of those conventional approaches, a new corpora independent intrinsic IC calculation measure has evolved. In this paper, we mainly focus on such intrinsic IC model and several topological aspects of the underlying ontology. Accuracy of intrinsic IC calculation and semantic similarity measure rely on these aspects deeply. Based on these analysis we propose an information theoretic framework which comprises an intrinsic IC calculator and a semantic similarity model. Our approach is compared with state of the art semantic similarity measures based on corpora dependent IC calculation as well as intrinsic IC based methods using several benchmark data set. We also compare our model with the related Edge based, Feature based and Distributional approaches. Experimental results show that our intrinsic IC model gives high correlation value when applied to different semantic similarity models. Our proposed semantic similarity model also achieves significant results when embedded with some state of the art IC models including ours.
ISSN:2331-8422