DiskANN++: Efficient Page-based Search over Isomorphic Mapped Graph Index using Query-sensitivity Entry Vertex
Given a vector dataset $\mathcal{X}$ and a query vector $\vec{x}_q$, graph-based Approximate Nearest Neighbor Search (ANNS) aims to build a graph index $G$ and approximately return vectors with minimum distances to $\vec{x}_q$ by searching over $G$. The main drawback of graph-based ANNS is that a gr...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Given a vector dataset $\mathcal{X}$ and a query vector $\vec{x}_q$,
graph-based Approximate Nearest Neighbor Search (ANNS) aims to build a graph
index $G$ and approximately return vectors with minimum distances to
$\vec{x}_q$ by searching over $G$. The main drawback of graph-based ANNS is
that a graph index would be too large to fit into the memory especially for a
large-scale $\mathcal{X}$. To solve this, a Product Quantization (PQ)-based
hybrid method called DiskANN is proposed to store a low-dimensional PQ index in
memory and retain a graph index in SSD, thus reducing memory overhead while
ensuring a high search accuracy. However, it suffers from two I/O issues that
significantly affect the overall efficiency: (1) long routing path from an
entry vertex to the query's neighborhood that results in large number of I/O
requests and (2) redundant I/O requests during the routing process. We propose
an optimized DiskANN++ to overcome above issues. Specifically, for the first
issue, we present a query-sensitive entry vertex selection strategy to replace
DiskANN's static graph-central entry vertex by a dynamically determined entry
vertex that is close to the query. For the second I/O issue, we present an
isomorphic mapping on DiskANN's graph index to optimize the SSD layout and
propose an asynchronously optimized Pagesearch based on the optimized SSD
layout as an alternative to DiskANN's beamsearch. Comprehensive experimental
studies on eight real-world datasets demonstrate our DiskANN++'s superiority on
efficiency. We achieve a notable 1.5 X to 2.2 X improvement on QPS compared to
DiskANN, given the same accuracy constraint. |
---|---|
DOI: | 10.48550/arxiv.2310.00402 |