Manu: A Cloud Native Vector Database Management System
With the development of learning-based embedding models, embedding vectors are widely used for analyzing and searching unstructured data. As vector collections exceed billion-scale, fully managed and horizontally scalable vector databases are necessary. In the past three years, through interaction w...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the development of learning-based embedding models, embedding vectors
are widely used for analyzing and searching unstructured data. As vector
collections exceed billion-scale, fully managed and horizontally scalable
vector databases are necessary. In the past three years, through interaction
with our 1200+ industry users, we have sketched a vision for the features that
next-generation vector databases should have, which include long-term
evolvability, tunable consistency, good elasticity, and high performance. We
present Manu, a cloud native vector database that implements these features. It
is difficult to integrate all these features if we follow traditional DBMS
design rules. As most vector data applications do not require complex data
models and strong data consistency, our design philosophy is to relax the data
model and consistency constraints in exchange for the aforementioned features.
Specifically, Manu firstly exposes the write-ahead log (WAL) and binlog as
backbone services. Secondly, write components are designed as log publishers
while all read-only analytic and search components are designed as independent
subscribers to the log services. Finally, we utilize multi-version concurrency
control (MVCC) and a delta consistency model to simplify the communication and
cooperation among the system components. These designs achieve a low coupling
among the system components, which is essential for elasticity and evolution.
We also extensively optimize Manu for performance and usability with
hardware-aware implementations and support for complex search semantics. |
---|---|
DOI: | 10.48550/arxiv.2206.13843 |