Weighting Passages Enhances Accuracy

We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on information systems 2021-03, Vol.39 (2), p.1-11
Hauptverfasser:	Muntean, Cristina Ioana, Nardini, Franco Maria, Perego, Raffaele, Tonellotto, Nicola, Frieder, Ophir
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	11
container_issue	2
container_start_page	1
container_title	ACM transactions on information systems
container_volume	39
creator	Muntean, Cristina Ioana Nardini, Franco Maria Perego, Raffaele Tonellotto, Nicola Frieder, Ophir
description	We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.
doi_str_mv	10.1145/3428687
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3428687</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3428687</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-115f94d94506f48cf13dd0b64e71bde5e829d27b73737ac08d671edc57ac1bbb3</originalsourceid><addsrcrecordid>eNotj01LAzEURYMoWKv4F7oQXEXzJl9vlqXUKhTaheJySF6S6YiOktRF_70plru492wuHMZuQTwAKP0oVYMG7RmbgNbIj3Bet1CGIyBesqtSPoSobMSE3b3Hod_th7GfbV0pro9lthx3bqQ65kS_2dHhml0k91nizamn7O1p-bp45uvN6mUxX3NqNO45gE6tCq3SwiSFlECGILxR0YIPUUds2tBYb2WNI4HBWIiBdAXw3sspu___pfxdSo6p-8nDl8uHDkR3lOtOcvIP89M_6A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Weighting Passages Enhances Accuracy</title><source>ACM Digital Library Complete</source><creator>Muntean, Cristina Ioana ; Nardini, Franco Maria ; Perego, Raffaele ; Tonellotto, Nicola ; Frieder, Ophir</creator><creatorcontrib>Muntean, Cristina Ioana ; Nardini, Franco Maria ; Perego, Raffaele ; Tonellotto, Nicola ; Frieder, Ophir</creatorcontrib><description>We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.</description><identifier>ISSN: 1046-8188</identifier><identifier>EISSN: 1558-2868</identifier><identifier>DOI: 10.1145/3428687</identifier><language>eng</language><ispartof>ACM transactions on information systems, 2021-03, Vol.39 (2), p.1-11</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-115f94d94506f48cf13dd0b64e71bde5e829d27b73737ac08d671edc57ac1bbb3</citedby><cites>FETCH-LOGICAL-c258t-115f94d94506f48cf13dd0b64e71bde5e829d27b73737ac08d671edc57ac1bbb3</cites><orcidid>0000-0001-5265-1831 ; 0000-0002-7427-1001</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Muntean, Cristina Ioana</creatorcontrib><creatorcontrib>Nardini, Franco Maria</creatorcontrib><creatorcontrib>Perego, Raffaele</creatorcontrib><creatorcontrib>Tonellotto, Nicola</creatorcontrib><creatorcontrib>Frieder, Ophir</creatorcontrib><title>Weighting Passages Enhances Accuracy</title><title>ACM transactions on information systems</title><description>We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.</description><issn>1046-8188</issn><issn>1558-2868</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNotj01LAzEURYMoWKv4F7oQXEXzJl9vlqXUKhTaheJySF6S6YiOktRF_70plru492wuHMZuQTwAKP0oVYMG7RmbgNbIj3Bet1CGIyBesqtSPoSobMSE3b3Hod_th7GfbV0pro9lthx3bqQ65kS_2dHhml0k91nizamn7O1p-bp45uvN6mUxX3NqNO45gE6tCq3SwiSFlECGILxR0YIPUUds2tBYb2WNI4HBWIiBdAXw3sspu___pfxdSo6p-8nDl8uHDkR3lOtOcvIP89M_6A</recordid><startdate>20210301</startdate><enddate>20210301</enddate><creator>Muntean, Cristina Ioana</creator><creator>Nardini, Franco Maria</creator><creator>Perego, Raffaele</creator><creator>Tonellotto, Nicola</creator><creator>Frieder, Ophir</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-5265-1831</orcidid><orcidid>https://orcid.org/0000-0002-7427-1001</orcidid></search><sort><creationdate>20210301</creationdate><title>Weighting Passages Enhances Accuracy</title><author>Muntean, Cristina Ioana ; Nardini, Franco Maria ; Perego, Raffaele ; Tonellotto, Nicola ; Frieder, Ophir</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-115f94d94506f48cf13dd0b64e71bde5e829d27b73737ac08d671edc57ac1bbb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Muntean, Cristina Ioana</creatorcontrib><creatorcontrib>Nardini, Franco Maria</creatorcontrib><creatorcontrib>Perego, Raffaele</creatorcontrib><creatorcontrib>Tonellotto, Nicola</creatorcontrib><creatorcontrib>Frieder, Ophir</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on information systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Muntean, Cristina Ioana</au><au>Nardini, Franco Maria</au><au>Perego, Raffaele</au><au>Tonellotto, Nicola</au><au>Frieder, Ophir</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Weighting Passages Enhances Accuracy</atitle><jtitle>ACM transactions on information systems</jtitle><date>2021-03-01</date><risdate>2021</risdate><volume>39</volume><issue>2</issue><spage>1</spage><epage>11</epage><pages>1-11</pages><issn>1046-8188</issn><eissn>1558-2868</eissn><abstract>We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.</abstract><doi>10.1145/3428687</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0001-5265-1831</orcidid><orcidid>https://orcid.org/0000-0002-7427-1001</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1046-8188
ispartof	ACM transactions on information systems, 2021-03, Vol.39 (2), p.1-11
issn	1046-8188 1558-2868
language	eng
recordid	cdi_crossref_primary_10_1145_3428687
source	ACM Digital Library Complete
title	Weighting Passages Enhances Accuracy
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T04%3A25%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Weighting%20Passages%20Enhances%20Accuracy&rft.jtitle=ACM%20transactions%20on%20information%20systems&rft.au=Muntean,%20Cristina%20Ioana&rft.date=2021-03-01&rft.volume=39&rft.issue=2&rft.spage=1&rft.epage=11&rft.pages=1-11&rft.issn=1046-8188&rft.eissn=1558-2868&rft_id=info:doi/10.1145/3428687&rft_dat=%3Ccrossref%3E10_1145_3428687%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true