Word-embedding-based pseudo-relevance feedback for Arabic information retrieval

Pseudo-relevance feedback (PRF) is a very effective query expansion approach, which reformulates queries by selecting expansion terms from top k pseudo-relevant documents. Although standard PRF models have been proven effective to deal with vocabulary mismatch between users’ queries and relevant doc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of information science 2019-08, Vol.45 (4), p.429-442
Hauptverfasser: El Mahdaouy, Abdelkader, El Alaoui, Saïd Ouatik, Gaussier, Eric
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 442
container_issue 4
container_start_page 429
container_title Journal of information science
container_volume 45
creator El Mahdaouy, Abdelkader
El Alaoui, Saïd Ouatik
Gaussier, Eric
description Pseudo-relevance feedback (PRF) is a very effective query expansion approach, which reformulates queries by selecting expansion terms from top k pseudo-relevant documents. Although standard PRF models have been proven effective to deal with vocabulary mismatch between users’ queries and relevant documents, expansion terms are selected without considering their similarity to the original query terms. In this article, we propose a method to incorporate word embedding (WE) similarity into PRF models for Arabic information retrieval (IR). The main idea is to select expansion terms using their distribution in the set of top pseudo-relevant documents along with their similarity to the original query terms. Experiments are conducted on the standard Arabic TREC 2001/2002 collection using three neural WE models. The obtained results show that our PRF extensions significantly outperform their baseline PRF models. Moreover, they enhanced the baseline IR model by 22% and 68% for the mean average precision (MAP) and the robustness index (RI), respectively.
doi_str_mv 10.1177/0165551518792210
format Article
fullrecord <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_02132288v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_0165551518792210</sage_id><sourcerecordid>2252079434</sourcerecordid><originalsourceid>FETCH-LOGICAL-c462t-17963a356cbbd9ef478f7653b7d12fa2eb7b7f09ff1acf380493eef166b1cd4c3</originalsourceid><addsrcrecordid>eNp1kM1Lw0AQxRdRsH7cPQY8eVjd2Y9scixFrVDoRfEY9mO2prZJ3U0L_vcmRBQET8PM-73H8Ai5AnYLoPUdg1wpBQoKXXIO7IhMQEuguSzUMZkMMh30U3KW0poxpkohJ2T52kZPcWvR-7pZUWsS-myXcO9bGnGDB9M4zAKit8a9Z6GN2TQaW7usbvpla7q6bbKIXax7dnNBToLZJLz8nufk5eH-eTani-Xj02y6oE7mvKOgy1wYoXJnrS8xSF0EnSthtQceDEerrQ6sDAGMC6JgshSIAfLcgvPSiXNyM-a-mU21i_XWxM-qNXU1ny6q4cY4CM6L4gA9ez2yu9h-7DF11brdx6Z_r-JccaZLKWRPsZFysU0pYviJBVYNFVd_K-4tdLQks8Lf0H_5L0_0eqs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2252079434</pqid></control><display><type>article</type><title>Word-embedding-based pseudo-relevance feedback for Arabic information retrieval</title><source>SAGE Complete A-Z List</source><creator>El Mahdaouy, Abdelkader ; El Alaoui, Saïd Ouatik ; Gaussier, Eric</creator><creatorcontrib>El Mahdaouy, Abdelkader ; El Alaoui, Saïd Ouatik ; Gaussier, Eric</creatorcontrib><description>Pseudo-relevance feedback (PRF) is a very effective query expansion approach, which reformulates queries by selecting expansion terms from top k pseudo-relevant documents. Although standard PRF models have been proven effective to deal with vocabulary mismatch between users’ queries and relevant documents, expansion terms are selected without considering their similarity to the original query terms. In this article, we propose a method to incorporate word embedding (WE) similarity into PRF models for Arabic information retrieval (IR). The main idea is to select expansion terms using their distribution in the set of top pseudo-relevant documents along with their similarity to the original query terms. Experiments are conducted on the standard Arabic TREC 2001/2002 collection using three neural WE models. The obtained results show that our PRF extensions significantly outperform their baseline PRF models. Moreover, they enhanced the baseline IR model by 22% and 68% for the mean average precision (MAP) and the robustness index (RI), respectively.</description><identifier>ISSN: 0165-5515</identifier><identifier>EISSN: 1741-6485</identifier><identifier>DOI: 10.1177/0165551518792210</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Computer Science ; Embedding ; Feedback ; Information Retrieval ; Queries ; Query expansion ; Relevance feedback ; Similarity</subject><ispartof>Journal of information science, 2019-08, Vol.45 (4), p.429-442</ispartof><rights>The Author(s) 2018</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c462t-17963a356cbbd9ef478f7653b7d12fa2eb7b7f09ff1acf380493eef166b1cd4c3</citedby><cites>FETCH-LOGICAL-c462t-17963a356cbbd9ef478f7653b7d12fa2eb7b7f09ff1acf380493eef166b1cd4c3</cites><orcidid>0000-0002-8858-3233 ; 0000-0003-4281-2472</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/0165551518792210$$EPDF$$P50$$Gsage$$H</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/0165551518792210$$EHTML$$P50$$Gsage$$H</linktohtml><link.rule.ids>230,314,777,781,882,21800,27905,27906,43602,43603</link.rule.ids><backlink>$$Uhttps://hal.science/hal-02132288$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>El Mahdaouy, Abdelkader</creatorcontrib><creatorcontrib>El Alaoui, Saïd Ouatik</creatorcontrib><creatorcontrib>Gaussier, Eric</creatorcontrib><title>Word-embedding-based pseudo-relevance feedback for Arabic information retrieval</title><title>Journal of information science</title><description>Pseudo-relevance feedback (PRF) is a very effective query expansion approach, which reformulates queries by selecting expansion terms from top k pseudo-relevant documents. Although standard PRF models have been proven effective to deal with vocabulary mismatch between users’ queries and relevant documents, expansion terms are selected without considering their similarity to the original query terms. In this article, we propose a method to incorporate word embedding (WE) similarity into PRF models for Arabic information retrieval (IR). The main idea is to select expansion terms using their distribution in the set of top pseudo-relevant documents along with their similarity to the original query terms. Experiments are conducted on the standard Arabic TREC 2001/2002 collection using three neural WE models. The obtained results show that our PRF extensions significantly outperform their baseline PRF models. Moreover, they enhanced the baseline IR model by 22% and 68% for the mean average precision (MAP) and the robustness index (RI), respectively.</description><subject>Computer Science</subject><subject>Embedding</subject><subject>Feedback</subject><subject>Information Retrieval</subject><subject>Queries</subject><subject>Query expansion</subject><subject>Relevance feedback</subject><subject>Similarity</subject><issn>0165-5515</issn><issn>1741-6485</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp1kM1Lw0AQxRdRsH7cPQY8eVjd2Y9scixFrVDoRfEY9mO2prZJ3U0L_vcmRBQET8PM-73H8Ai5AnYLoPUdg1wpBQoKXXIO7IhMQEuguSzUMZkMMh30U3KW0poxpkohJ2T52kZPcWvR-7pZUWsS-myXcO9bGnGDB9M4zAKit8a9Z6GN2TQaW7usbvpla7q6bbKIXax7dnNBToLZJLz8nufk5eH-eTani-Xj02y6oE7mvKOgy1wYoXJnrS8xSF0EnSthtQceDEerrQ6sDAGMC6JgshSIAfLcgvPSiXNyM-a-mU21i_XWxM-qNXU1ny6q4cY4CM6L4gA9ez2yu9h-7DF11brdx6Z_r-JccaZLKWRPsZFysU0pYviJBVYNFVd_K-4tdLQks8Lf0H_5L0_0eqs</recordid><startdate>20190801</startdate><enddate>20190801</enddate><creator>El Mahdaouy, Abdelkader</creator><creator>El Alaoui, Saïd Ouatik</creator><creator>Gaussier, Eric</creator><general>SAGE Publications</general><general>Bowker-Saur Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0002-8858-3233</orcidid><orcidid>https://orcid.org/0000-0003-4281-2472</orcidid></search><sort><creationdate>20190801</creationdate><title>Word-embedding-based pseudo-relevance feedback for Arabic information retrieval</title><author>El Mahdaouy, Abdelkader ; El Alaoui, Saïd Ouatik ; Gaussier, Eric</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c462t-17963a356cbbd9ef478f7653b7d12fa2eb7b7f09ff1acf380493eef166b1cd4c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science</topic><topic>Embedding</topic><topic>Feedback</topic><topic>Information Retrieval</topic><topic>Queries</topic><topic>Query expansion</topic><topic>Relevance feedback</topic><topic>Similarity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>El Mahdaouy, Abdelkader</creatorcontrib><creatorcontrib>El Alaoui, Saïd Ouatik</creatorcontrib><creatorcontrib>Gaussier, Eric</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>Journal of information science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>El Mahdaouy, Abdelkader</au><au>El Alaoui, Saïd Ouatik</au><au>Gaussier, Eric</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Word-embedding-based pseudo-relevance feedback for Arabic information retrieval</atitle><jtitle>Journal of information science</jtitle><date>2019-08-01</date><risdate>2019</risdate><volume>45</volume><issue>4</issue><spage>429</spage><epage>442</epage><pages>429-442</pages><issn>0165-5515</issn><eissn>1741-6485</eissn><abstract>Pseudo-relevance feedback (PRF) is a very effective query expansion approach, which reformulates queries by selecting expansion terms from top k pseudo-relevant documents. Although standard PRF models have been proven effective to deal with vocabulary mismatch between users’ queries and relevant documents, expansion terms are selected without considering their similarity to the original query terms. In this article, we propose a method to incorporate word embedding (WE) similarity into PRF models for Arabic information retrieval (IR). The main idea is to select expansion terms using their distribution in the set of top pseudo-relevant documents along with their similarity to the original query terms. Experiments are conducted on the standard Arabic TREC 2001/2002 collection using three neural WE models. The obtained results show that our PRF extensions significantly outperform their baseline PRF models. Moreover, they enhanced the baseline IR model by 22% and 68% for the mean average precision (MAP) and the robustness index (RI), respectively.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.1177/0165551518792210</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-8858-3233</orcidid><orcidid>https://orcid.org/0000-0003-4281-2472</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0165-5515
ispartof Journal of information science, 2019-08, Vol.45 (4), p.429-442
issn 0165-5515
1741-6485
language eng
recordid cdi_hal_primary_oai_HAL_hal_02132288v1
source SAGE Complete A-Z List
subjects Computer Science
Embedding
Feedback
Information Retrieval
Queries
Query expansion
Relevance feedback
Similarity
title Word-embedding-based pseudo-relevance feedback for Arabic information retrieval
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T07%3A10%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Word-embedding-based%20pseudo-relevance%20feedback%20for%20Arabic%20information%20retrieval&rft.jtitle=Journal%20of%20information%20science&rft.au=El%20Mahdaouy,%20Abdelkader&rft.date=2019-08-01&rft.volume=45&rft.issue=4&rft.spage=429&rft.epage=442&rft.pages=429-442&rft.issn=0165-5515&rft.eissn=1741-6485&rft_id=info:doi/10.1177/0165551518792210&rft_dat=%3Cproquest_hal_p%3E2252079434%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2252079434&rft_id=info:pmid/&rft_sage_id=10.1177_0165551518792210&rfr_iscdi=true