ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference
State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms, however, are not without flaws, i.e., running the model on...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | State-of-the-art neural models typically encode document-query pairs using
cross-attention for re-ranking. To this end, models generally utilize an
encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach.
These paradigms, however, are not without flaws, i.e., running the model on all
query-document pairs at inference-time incurs a significant computational cost.
This paper proposes a new training and inference paradigm for re-ranking. We
propose to finetune a pretrained encoder-decoder model using in the form of
document to query generation. Subsequently, we show that this encoder-decoder
architecture can be decomposed into a decoder-only language model during
inference. This results in significant inference time speedups since the
decoder-only architecture only needs to learn to interpret static encoder
embeddings during inference. Our experiments show that this new paradigm
achieves results that are comparable to the more expensive cross-attention
ranking approaches while being up to 6.8X faster. We believe this work paves
the way for more efficient neural rankers that leverage large pretrained
models. |
---|---|
DOI: | 10.48550/arxiv.2204.11458 |