Long Document Re-ranking with Modular Re-ranker

Long document re-ranking has been a challenging problem for neural re-rankers based on deep language models like BERT. Early work breaks the documents into short passage-like chunks. These chunks are independently mapped to scalar scores or latent vectors, which are then pooled into a final relevanc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-06
Hauptverfasser:	Gao, Luyu, Callan, Jamie
Format:	Artikel
Sprache:	eng
Schlagworte:	Coders Computer Science - Computation and Language Computer Science - Information Retrieval Documents Modules Queries Ranking Representations Work breaks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Gao, Luyu Callan, Jamie
description	Long document re-ranking has been a challenging problem for neural re-rankers based on deep language models like BERT. Early work breaks the documents into short passage-like chunks. These chunks are independently mapped to scalar scores or latent vectors, which are then pooled into a final relevance score. These encode-and-pool methods however inevitably introduce an information bottleneck: the low dimension representations. In this paper, we propose instead to model full query-to-document interaction, leveraging the attention operation and modular Transformer re-ranker framework. First, document chunks are encoded independently with an encoder module. An interaction module then encodes the query and performs joint attention from the query to all document chunk representations. We demonstrate that the model can use this new degree of freedom to aggregate important information from the entire document. Our experiments show that this design produces effective re-ranking on two classical IR collections Robust04 and ClueWeb09, and a large-scale supervised collection MS-MARCO document ranking.
doi_str_mv	10.48550/arxiv.2205.04275
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2205_04275</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2661731224</sourcerecordid><originalsourceid>FETCH-LOGICAL-a524-ff358e5c9c8bf9ea3c9da6637d6316183aec2ae0c36423bb249bcb2ebee780fe3</originalsourceid><addsrcrecordid>eNo1j11LwzAYhYMgOOZ-gFcWvG6XvG-SppcyPyZ0DGT3JUnfaufWzrT1499bN7068HA4nIexK8ETaZTicxu-6o8EgKuES0jVGZsAooiNBLhgs67bcs5Bp6AUTtg8b5uX6K71w56aPnqmONjmrR7ZZ92_Rqu2HHY2_HMKl-y8sruOZn85ZZuH-81iGefrx6fFbR5bBTKuKlSGlM-8cVVGFn1WWq0xLTUKLQxa8mCJe9QS0DmQmfMOyBGlhleEU3Z9mj3aFIdQ7234Ln6tiqPV2Lg5NQ6hfR-o64ttO4Rm_FSA1iJFASDxB25GTuA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2661731224</pqid></control><display><type>article</type><title>Long Document Re-ranking with Modular Re-ranker</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Gao, Luyu ; Callan, Jamie</creator><creatorcontrib>Gao, Luyu ; Callan, Jamie</creatorcontrib><description>Long document re-ranking has been a challenging problem for neural re-rankers based on deep language models like BERT. Early work breaks the documents into short passage-like chunks. These chunks are independently mapped to scalar scores or latent vectors, which are then pooled into a final relevance score. These encode-and-pool methods however inevitably introduce an information bottleneck: the low dimension representations. In this paper, we propose instead to model full query-to-document interaction, leveraging the attention operation and modular Transformer re-ranker framework. First, document chunks are encoded independently with an encoder module. An interaction module then encodes the query and performs joint attention from the query to all document chunk representations. We demonstrate that the model can use this new degree of freedom to aggregate important information from the entire document. Our experiments show that this design produces effective re-ranking on two classical IR collections Robust04 and ClueWeb09, and a large-scale supervised collection MS-MARCO document ranking.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2205.04275</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Coders ; Computer Science - Computation and Language ; Computer Science - Information Retrieval ; Documents ; Modules ; Queries ; Ranking ; Representations ; Work breaks</subject><ispartof>arXiv.org, 2022-06</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27925</link.rule.ids><backlink>$$Uhttps://doi.org/10.1145/3477495.3531860$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2205.04275$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gao, Luyu</creatorcontrib><creatorcontrib>Callan, Jamie</creatorcontrib><title>Long Document Re-ranking with Modular Re-ranker</title><title>arXiv.org</title><description>Long document re-ranking has been a challenging problem for neural re-rankers based on deep language models like BERT. Early work breaks the documents into short passage-like chunks. These chunks are independently mapped to scalar scores or latent vectors, which are then pooled into a final relevance score. These encode-and-pool methods however inevitably introduce an information bottleneck: the low dimension representations. In this paper, we propose instead to model full query-to-document interaction, leveraging the attention operation and modular Transformer re-ranker framework. First, document chunks are encoded independently with an encoder module. An interaction module then encodes the query and performs joint attention from the query to all document chunk representations. We demonstrate that the model can use this new degree of freedom to aggregate important information from the entire document. Our experiments show that this design produces effective re-ranking on two classical IR collections Robust04 and ClueWeb09, and a large-scale supervised collection MS-MARCO document ranking.</description><subject>Coders</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Information Retrieval</subject><subject>Documents</subject><subject>Modules</subject><subject>Queries</subject><subject>Ranking</subject><subject>Representations</subject><subject>Work breaks</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNo1j11LwzAYhYMgOOZ-gFcWvG6XvG-SppcyPyZ0DGT3JUnfaufWzrT1499bN7068HA4nIexK8ETaZTicxu-6o8EgKuES0jVGZsAooiNBLhgs67bcs5Bp6AUTtg8b5uX6K71w56aPnqmONjmrR7ZZ92_Rqu2HHY2_HMKl-y8sruOZn85ZZuH-81iGefrx6fFbR5bBTKuKlSGlM-8cVVGFn1WWq0xLTUKLQxa8mCJe9QS0DmQmfMOyBGlhleEU3Z9mj3aFIdQ7234Ln6tiqPV2Lg5NQ6hfR-o64ttO4Rm_FSA1iJFASDxB25GTuA</recordid><startdate>20220606</startdate><enddate>20220606</enddate><creator>Gao, Luyu</creator><creator>Callan, Jamie</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220606</creationdate><title>Long Document Re-ranking with Modular Re-ranker</title><author>Gao, Luyu ; Callan, Jamie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a524-ff358e5c9c8bf9ea3c9da6637d6316183aec2ae0c36423bb249bcb2ebee780fe3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Coders</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Information Retrieval</topic><topic>Documents</topic><topic>Modules</topic><topic>Queries</topic><topic>Ranking</topic><topic>Representations</topic><topic>Work breaks</topic><toplevel>online_resources</toplevel><creatorcontrib>Gao, Luyu</creatorcontrib><creatorcontrib>Callan, Jamie</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gao, Luyu</au><au>Callan, Jamie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Long Document Re-ranking with Modular Re-ranker</atitle><jtitle>arXiv.org</jtitle><date>2022-06-06</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Long document re-ranking has been a challenging problem for neural re-rankers based on deep language models like BERT. Early work breaks the documents into short passage-like chunks. These chunks are independently mapped to scalar scores or latent vectors, which are then pooled into a final relevance score. These encode-and-pool methods however inevitably introduce an information bottleneck: the low dimension representations. In this paper, we propose instead to model full query-to-document interaction, leveraging the attention operation and modular Transformer re-ranker framework. First, document chunks are encoded independently with an encoder module. An interaction module then encodes the query and performs joint attention from the query to all document chunk representations. We demonstrate that the model can use this new degree of freedom to aggregate important information from the entire document. Our experiments show that this design produces effective re-ranking on two classical IR collections Robust04 and ClueWeb09, and a large-scale supervised collection MS-MARCO document ranking.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2205.04275</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-06
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2205_04275
source	arXiv.org; Free E- Journals
subjects	Coders Computer Science - Computation and Language Computer Science - Information Retrieval Documents Modules Queries Ranking Representations Work breaks
title	Long Document Re-ranking with Modular Re-ranker
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T15%3A50%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Long%20Document%20Re-ranking%20with%20Modular%20Re-ranker&rft.jtitle=arXiv.org&rft.au=Gao,%20Luyu&rft.date=2022-06-06&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2205.04275&rft_dat=%3Cproquest_arxiv%3E2661731224%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2661731224&rft_id=info:pmid/&rfr_iscdi=true