A Highest Sense Count Based Method for Disambiguation of Web Queries for Hindi Language Web Information Retrieval
The ambiguity in word senses has been recognized as a major challenge for the information retrieval systems. Hindi language web information retrieval, like other languages, faces the problem of sense ambiguity. The sense ambiguity problem deteriorates the performance of every natural language proces...
Gespeichert in:
Veröffentlicht in: | International journal of information retrieval research 2012-10, Vol.2 (4), p.1-11 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The ambiguity in word senses has been recognized as a major challenge for the information retrieval systems. Hindi language web information retrieval, like other languages, faces the problem of sense ambiguity. The sense ambiguity problem deteriorates the performance of every natural language processing (NLP) application. The performance of Hindi language web information retrieval is also affected by it. In this paper, the author formalized an approach for the disambiguation of the senses to improve the performance of Hindi web information retrieval. Our system works in such a way that ambiguity detection has been performed before disambiguation of web queries. Test samples of 100 queries have been selected. When these queries were subjected to ambiguity detection, we found that 43% of them have been detected unambiguous. After ambiguity detection, the disambiguation approach is followed which is based on HSC (Highest Sense Count). Query disambiguation approach further follows query expansion. The expanded query generates the new result set which results into high precision and high similarity score. The 57 expanded queries are tested against 1000 test document instances. The overall improvement is 45% in the average precision, 23% in interpolated average precision and a significant improvement in the average similarity score of the new generated result set. The overall accuracy of our approach has been 61.4% and it improves the performance of the system by 45%. |
---|---|
ISSN: | 2155-6377 2155-6385 |
DOI: | 10.4018/ijirr.2012100101 |