A Highest Sense Count Based Method for Disambiguation of Web Queries for Hindi Language Web Information Retrieval

The ambiguity in word senses has been recognized as a major challenge for the information retrieval systems. Hindi language web information retrieval, like other languages, faces the problem of sense ambiguity. The sense ambiguity problem deteriorates the performance of every natural language proces...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of information retrieval research 2012-10, Vol.2 (4), p.1-11
1. Verfasser:	Dwivedi, Sanjay K
Format:	Artikel
Sprache:	eng
Schlagworte:	Ambiguity Analysis Computational linguistics Hindi language Information retrieval Information services Information storage and retrieval Language processing Methods Natural language interfaces Natural language processing Online information services Online services Performance enhancement Queries Query expansion Similarity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The ambiguity in word senses has been recognized as a major challenge for the information retrieval systems. Hindi language web information retrieval, like other languages, faces the problem of sense ambiguity. The sense ambiguity problem deteriorates the performance of every natural language processing (NLP) application. The performance of Hindi language web information retrieval is also affected by it. In this paper, the author formalized an approach for the disambiguation of the senses to improve the performance of Hindi web information retrieval. Our system works in such a way that ambiguity detection has been performed before disambiguation of web queries. Test samples of 100 queries have been selected. When these queries were subjected to ambiguity detection, we found that 43% of them have been detected unambiguous. After ambiguity detection, the disambiguation approach is followed which is based on HSC (Highest Sense Count). Query disambiguation approach further follows query expansion. The expanded query generates the new result set which results into high precision and high similarity score. The 57 expanded queries are tested against 1000 test document instances. The overall improvement is 45% in the average precision, 23% in interpolated average precision and a significant improvement in the average similarity score of the new generated result set. The overall accuracy of our approach has been 61.4% and it improves the performance of the system by 45%.
ISSN:	2155-6377 2155-6385
DOI:	10.4018/ijirr.2012100101