Retrieval for Extremely Long Queries and Documents with RPRS: a Highly Efficient and Effective Transformer-based Re-Ranker
Retrieval with extremely long queries and documents is a well-known and challenging task in information retrieval and is commonly known as Query-by-Document (QBD) retrieval. Specifically designed Transformer models that can handle long input sequences have not shown high effectiveness in QBD tasks i...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Retrieval with extremely long queries and documents is a well-known and
challenging task in information retrieval and is commonly known as
Query-by-Document (QBD) retrieval. Specifically designed Transformer models
that can handle long input sequences have not shown high effectiveness in QBD
tasks in previous work. We propose a Re-Ranker based on the novel Proportional
Relevance Score (RPRS) to compute the relevance score between a query and the
top-k candidate documents. Our extensive evaluation shows RPRS obtains
significantly better results than the state-of-the-art models on five different
datasets. Furthermore, RPRS is highly efficient since all documents can be
pre-processed, embedded, and indexed before query time which gives our
re-ranker the advantage of having a complexity of O(N) where N is the total
number of sentences in the query and candidate documents. Furthermore, our
method solves the problem of the low-resource training in QBD retrieval tasks
as it does not need large amounts of training data, and has only three
parameters with a limited range that can be optimized with a grid search even
if a small amount of labeled data is available. Our detailed analysis shows
that RPRS benefits from covering the full length of candidate documents and
queries. |
---|---|
DOI: | 10.48550/arxiv.2303.01200 |