FarsAcademic: A Standard Persian Test Collection for Information Retrieval in Scientific Texts
Many scientific texts are produced in Persian and available in scientific information databases through the Web. In this paper, FarsAcademic, a test collection of Persian scientific texts, has been built to implement information retrieval models among academic search comprising 102238 documents and...
Gespeichert in:
Veröffentlicht in: | International journal of information science and management 2023-07, Vol.21 (3), p.187-208 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Many scientific texts are produced in Persian and available in scientific information databases through the Web. In this paper, FarsAcademic, a test collection of Persian scientific texts, has been built to implement information retrieval models among academic search comprising 102238 documents and 61 topics. While constructing FarsAcademic, we have tried to resolve the problems specific to information retrieval (IR) and natural language processing (NLP) in Persian scientific texts. Domain experts were employed to create queries within their research area, and user relevance and topical relevance were applied to improve the precision of relevance judgment of documents. Further, to improve retrieval performance in Persian scientific texts, automated query expansion was applied using one of the relevant feedback techniques, the Local Context Analysis algorithm. The result showed that query expansion techniques outperformed other information retrieval models in the Persian scientific texts retrieval task. Eventually, FarsAcademic became the only one that has been free of charge for Iranian information retrieval scholars to implement and evaluate different information retrieval models and algorithms on Persian scientific text and academic search. |
---|---|
ISSN: | 2008-8302 2008-8310 |
DOI: | 10.22034/ijism.2023.1977626.0 |