Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders
Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance. Existing approaches perform k-NN search with CE by approximating the CE similarity with a vector embedding space fit...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Cross-encoder (CE) models which compute similarity by jointly encoding a
query-item pair perform better than embedding-based models (dual-encoders) at
estimating query-item relevance. Existing approaches perform k-NN search with
CE by approximating the CE similarity with a vector embedding space fit either
with dual-encoders (DE) or CUR matrix factorization. DE-based
retrieve-and-rerank approaches suffer from poor recall on new domains and the
retrieval with DE is decoupled from the CE. While CUR-based approaches can be
more accurate than the DE-based approach, they require a prohibitively large
number of CE calls to compute item embeddings, thus making it impractical for
deployment at scale. In this paper, we address these shortcomings with our
proposed sparse-matrix factorization based method that efficiently computes
latent query and item embeddings to approximate CE scores and performs k-NN
search with the approximate CE similarity. We compute item embeddings offline
by factorizing a sparse matrix containing query-item CE scores for a set of
train queries. Our method produces a high-quality approximation while requiring
only a fraction of CE calls as compared to CUR-based methods, and allows for
leveraging DE to initialize the embedding space while avoiding compute- and
resource-intensive finetuning of DE via distillation. At test time, the item
embeddings remain fixed and retrieval occurs over rounds, alternating between
a) estimating the test query embedding by minimizing error in approximating CE
scores of items retrieved thus far, and b) using the updated test query
embedding for retrieving more items. Our k-NN search method improves recall by
up to 5% (k=1) and 54% (k=100) over DE-based approaches. Additionally, our
indexing approach achieves a speedup of up to 100x over CUR-based and 5x over
DE distillation methods, while matching or improving k-NN search recall over
baselines. |
---|---|
DOI: | 10.48550/arxiv.2405.03651 |