Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization
Efficient k-nearest neighbor search is a fundamental task, foundational for many problems in NLP. When the similarity is measured by dot-product between dual-encoder vectors or $\ell_2$-distance, there already exist many scalable and efficient search methods. But not so when similarity is measured b...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Efficient k-nearest neighbor search is a fundamental task, foundational for
many problems in NLP. When the similarity is measured by dot-product between
dual-encoder vectors or $\ell_2$-distance, there already exist many scalable
and efficient search methods. But not so when similarity is measured by more
accurate and expensive black-box neural similarity models, such as
cross-encoders, which jointly encode the query and candidate neighbor. The
cross-encoders' high computational cost typically limits their use to reranking
candidates retrieved by a cheaper model, such as dual encoder or TF-IDF.
However, the accuracy of such a two-stage approach is upper-bounded by the
recall of the initial candidate set, and potentially requires additional
training to align the auxiliary retrieval model with the cross-encoder model.
In this paper, we present an approach that avoids the use of a dual-encoder for
retrieval, relying solely on the cross-encoder. Retrieval is made efficient
with CUR decomposition, a matrix decomposition approach that approximates all
pairwise cross-encoder distances from a small subset of rows and columns of the
distance matrix. Indexing items using our approach is computationally cheaper
than training an auxiliary dual-encoder model through distillation.
Empirically, for k > 10, our approach provides test-time
recall-vs-computational cost trade-offs superior to the current widely-used
methods that re-rank items retrieved using a dual-encoder or TF-IDF. |
---|---|
DOI: | 10.48550/arxiv.2210.12579 |