SPFresh: Incremental In-Place Update for Billion-Scale Vector Search

Approximate Nearest Neighbor Search (ANNS) is now widely used in various applications, ranging from information retrieval, question answering, and recommendation, to search for similar high-dimensional vectors. As the amount of vector data grows continuously, it becomes important to support updates...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-10
Hauptverfasser: Xu, Yuming, Liang, Hengyu, Li, Jin, Xu, Shuotao, Chen, Qi, Zhang, Qianxi, Cheng, Li, Yang, Ziyue, Yang, Fan, Yang, Yuqing, Cheng, Peng, Mao, Yang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Xu, Yuming
Liang, Hengyu
Li, Jin
Xu, Shuotao
Chen, Qi
Zhang, Qianxi
Cheng, Li
Yang, Ziyue
Yang, Fan
Yang, Yuqing
Cheng, Peng
Mao, Yang
description Approximate Nearest Neighbor Search (ANNS) is now widely used in various applications, ranging from information retrieval, question answering, and recommendation, to search for similar high-dimensional vectors. As the amount of vector data grows continuously, it becomes important to support updates to vector index, the enabling technique that allows for efficient and accurate ANNS on vectors. Because of the curse of high dimensionality, it is often costly to identify the right neighbors of a single new vector, a necessary process for index update. To amortize update costs, existing systems maintain a secondary index to accumulate updates, which are merged by the main index by global rebuilding the entire index periodically. However, this approach has high fluctuations of search latency and accuracy, not even to mention that it requires substantial resources and is extremely time-consuming for rebuilds. We introduce SPFresh, a system that supports in-place vector updates. At the heart of SPFresh is LIRE, a lightweight incremental rebalancing protocol to split vector partitions and reassign vectors in the nearby partitions to adapt to data distribution shift. LIRE achieves low-overhead vector updates by only reassigning vectors at the boundary between partitions, where in a high-quality vector index the amount of such vectors are deemed small. With LIRE, SPFresh provides superior query latency and accuracy to solutions based on global rebuild, with only 1% of DRAM and less than 10% cores needed at the peak compared to the state-of-the-art, in a billion scale vector index with 1% of daily vector update rate.
doi_str_mv 10.48550/arxiv.2410.14452
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2410_14452</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3118915435</sourcerecordid><originalsourceid>FETCH-LOGICAL-a525-9029cbb83885885be9afbafb2c6dae77b59dabffa4f945fd79050b55e1520023</originalsourceid><addsrcrecordid>eNotj19Lw0AQxA9BsNR-AJ8M-Jx6_7bJ-abVaqFgIepr2Lvs0ZQ0qZdU9Nt7tsLALMOwzI-xK8GnOgfgtxi-66-p1DEQWoM8YyOplEhzLeUFm_T9lnMuZ5kEUCP2WKwXgfrNXbJsXaAdtQM28U7XDTpK3vcVDpT4LiQPddPUXZsWDhtKPsgNMSwIg9tcsnOPTU-Tfx-zYvH0Nn9JV6_Py_n9KkWQkBoujbM2V3kOUZYMehsl3axCyjILpkLrPWpvNPgqMxy4BSABMi5WY3Z9-nokLPeh3mH4Kf9IyyNpbNycGvvQfR6oH8ptdwhtnFQqIXIjQCtQv3MwVUQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3118915435</pqid></control><display><type>article</type><title>SPFresh: Incremental In-Place Update for Billion-Scale Vector Search</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Xu, Yuming ; Liang, Hengyu ; Li, Jin ; Xu, Shuotao ; Chen, Qi ; Zhang, Qianxi ; Cheng, Li ; Yang, Ziyue ; Yang, Fan ; Yang, Yuqing ; Cheng, Peng ; Mao, Yang</creator><creatorcontrib>Xu, Yuming ; Liang, Hengyu ; Li, Jin ; Xu, Shuotao ; Chen, Qi ; Zhang, Qianxi ; Cheng, Li ; Yang, Ziyue ; Yang, Fan ; Yang, Yuqing ; Cheng, Peng ; Mao, Yang</creatorcontrib><description>Approximate Nearest Neighbor Search (ANNS) is now widely used in various applications, ranging from information retrieval, question answering, and recommendation, to search for similar high-dimensional vectors. As the amount of vector data grows continuously, it becomes important to support updates to vector index, the enabling technique that allows for efficient and accurate ANNS on vectors. Because of the curse of high dimensionality, it is often costly to identify the right neighbors of a single new vector, a necessary process for index update. To amortize update costs, existing systems maintain a secondary index to accumulate updates, which are merged by the main index by global rebuilding the entire index periodically. However, this approach has high fluctuations of search latency and accuracy, not even to mention that it requires substantial resources and is extremely time-consuming for rebuilds. We introduce SPFresh, a system that supports in-place vector updates. At the heart of SPFresh is LIRE, a lightweight incremental rebalancing protocol to split vector partitions and reassign vectors in the nearby partitions to adapt to data distribution shift. LIRE achieves low-overhead vector updates by only reassigning vectors at the boundary between partitions, where in a high-quality vector index the amount of such vectors are deemed small. With LIRE, SPFresh provides superior query latency and accuracy to solutions based on global rebuild, with only 1% of DRAM and less than 10% cores needed at the peak compared to the state-of-the-art, in a billion scale vector index with 1% of daily vector update rate.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2410.14452</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Information Retrieval ; Information retrieval ; Searching</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27904</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.14452$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1145/3600006.3613166$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Xu, Yuming</creatorcontrib><creatorcontrib>Liang, Hengyu</creatorcontrib><creatorcontrib>Li, Jin</creatorcontrib><creatorcontrib>Xu, Shuotao</creatorcontrib><creatorcontrib>Chen, Qi</creatorcontrib><creatorcontrib>Zhang, Qianxi</creatorcontrib><creatorcontrib>Cheng, Li</creatorcontrib><creatorcontrib>Yang, Ziyue</creatorcontrib><creatorcontrib>Yang, Fan</creatorcontrib><creatorcontrib>Yang, Yuqing</creatorcontrib><creatorcontrib>Cheng, Peng</creatorcontrib><creatorcontrib>Mao, Yang</creatorcontrib><title>SPFresh: Incremental In-Place Update for Billion-Scale Vector Search</title><title>arXiv.org</title><description>Approximate Nearest Neighbor Search (ANNS) is now widely used in various applications, ranging from information retrieval, question answering, and recommendation, to search for similar high-dimensional vectors. As the amount of vector data grows continuously, it becomes important to support updates to vector index, the enabling technique that allows for efficient and accurate ANNS on vectors. Because of the curse of high dimensionality, it is often costly to identify the right neighbors of a single new vector, a necessary process for index update. To amortize update costs, existing systems maintain a secondary index to accumulate updates, which are merged by the main index by global rebuilding the entire index periodically. However, this approach has high fluctuations of search latency and accuracy, not even to mention that it requires substantial resources and is extremely time-consuming for rebuilds. We introduce SPFresh, a system that supports in-place vector updates. At the heart of SPFresh is LIRE, a lightweight incremental rebalancing protocol to split vector partitions and reassign vectors in the nearby partitions to adapt to data distribution shift. LIRE achieves low-overhead vector updates by only reassigning vectors at the boundary between partitions, where in a high-quality vector index the amount of such vectors are deemed small. With LIRE, SPFresh provides superior query latency and accuracy to solutions based on global rebuild, with only 1% of DRAM and less than 10% cores needed at the peak compared to the state-of-the-art, in a billion scale vector index with 1% of daily vector update rate.</description><subject>Computer Science - Information Retrieval</subject><subject>Information retrieval</subject><subject>Searching</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotj19Lw0AQxA9BsNR-AJ8M-Jx6_7bJ-abVaqFgIepr2Lvs0ZQ0qZdU9Nt7tsLALMOwzI-xK8GnOgfgtxi-66-p1DEQWoM8YyOplEhzLeUFm_T9lnMuZ5kEUCP2WKwXgfrNXbJsXaAdtQM28U7XDTpK3vcVDpT4LiQPddPUXZsWDhtKPsgNMSwIg9tcsnOPTU-Tfx-zYvH0Nn9JV6_Py_n9KkWQkBoujbM2V3kOUZYMehsl3axCyjILpkLrPWpvNPgqMxy4BSABMi5WY3Z9-nokLPeh3mH4Kf9IyyNpbNycGvvQfR6oH8ptdwhtnFQqIXIjQCtQv3MwVUQ</recordid><startdate>20241018</startdate><enddate>20241018</enddate><creator>Xu, Yuming</creator><creator>Liang, Hengyu</creator><creator>Li, Jin</creator><creator>Xu, Shuotao</creator><creator>Chen, Qi</creator><creator>Zhang, Qianxi</creator><creator>Cheng, Li</creator><creator>Yang, Ziyue</creator><creator>Yang, Fan</creator><creator>Yang, Yuqing</creator><creator>Cheng, Peng</creator><creator>Mao, Yang</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241018</creationdate><title>SPFresh: Incremental In-Place Update for Billion-Scale Vector Search</title><author>Xu, Yuming ; Liang, Hengyu ; Li, Jin ; Xu, Shuotao ; Chen, Qi ; Zhang, Qianxi ; Cheng, Li ; Yang, Ziyue ; Yang, Fan ; Yang, Yuqing ; Cheng, Peng ; Mao, Yang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a525-9029cbb83885885be9afbafb2c6dae77b59dabffa4f945fd79050b55e1520023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Information Retrieval</topic><topic>Information retrieval</topic><topic>Searching</topic><toplevel>online_resources</toplevel><creatorcontrib>Xu, Yuming</creatorcontrib><creatorcontrib>Liang, Hengyu</creatorcontrib><creatorcontrib>Li, Jin</creatorcontrib><creatorcontrib>Xu, Shuotao</creatorcontrib><creatorcontrib>Chen, Qi</creatorcontrib><creatorcontrib>Zhang, Qianxi</creatorcontrib><creatorcontrib>Cheng, Li</creatorcontrib><creatorcontrib>Yang, Ziyue</creatorcontrib><creatorcontrib>Yang, Fan</creatorcontrib><creatorcontrib>Yang, Yuqing</creatorcontrib><creatorcontrib>Cheng, Peng</creatorcontrib><creatorcontrib>Mao, Yang</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xu, Yuming</au><au>Liang, Hengyu</au><au>Li, Jin</au><au>Xu, Shuotao</au><au>Chen, Qi</au><au>Zhang, Qianxi</au><au>Cheng, Li</au><au>Yang, Ziyue</au><au>Yang, Fan</au><au>Yang, Yuqing</au><au>Cheng, Peng</au><au>Mao, Yang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SPFresh: Incremental In-Place Update for Billion-Scale Vector Search</atitle><jtitle>arXiv.org</jtitle><date>2024-10-18</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Approximate Nearest Neighbor Search (ANNS) is now widely used in various applications, ranging from information retrieval, question answering, and recommendation, to search for similar high-dimensional vectors. As the amount of vector data grows continuously, it becomes important to support updates to vector index, the enabling technique that allows for efficient and accurate ANNS on vectors. Because of the curse of high dimensionality, it is often costly to identify the right neighbors of a single new vector, a necessary process for index update. To amortize update costs, existing systems maintain a secondary index to accumulate updates, which are merged by the main index by global rebuilding the entire index periodically. However, this approach has high fluctuations of search latency and accuracy, not even to mention that it requires substantial resources and is extremely time-consuming for rebuilds. We introduce SPFresh, a system that supports in-place vector updates. At the heart of SPFresh is LIRE, a lightweight incremental rebalancing protocol to split vector partitions and reassign vectors in the nearby partitions to adapt to data distribution shift. LIRE achieves low-overhead vector updates by only reassigning vectors at the boundary between partitions, where in a high-quality vector index the amount of such vectors are deemed small. With LIRE, SPFresh provides superior query latency and accuracy to solutions based on global rebuild, with only 1% of DRAM and less than 10% cores needed at the peak compared to the state-of-the-art, in a billion scale vector index with 1% of daily vector update rate.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2410.14452</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-10
issn 2331-8422
language eng
recordid cdi_arxiv_primary_2410_14452
source arXiv.org; Free E- Journals
subjects Computer Science - Information Retrieval
Information retrieval
Searching
title SPFresh: Incremental In-Place Update for Billion-Scale Vector Search
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T17%3A41%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SPFresh:%20Incremental%20In-Place%20Update%20for%20Billion-Scale%20Vector%20Search&rft.jtitle=arXiv.org&rft.au=Xu,%20Yuming&rft.date=2024-10-18&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2410.14452&rft_dat=%3Cproquest_arxiv%3E3118915435%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3118915435&rft_id=info:pmid/&rfr_iscdi=true