SPFresh: Incremental In-Place Update for Billion-Scale Vector Search

Approximate Nearest Neighbor Search (ANNS) is now widely used in various applications, ranging from information retrieval, question answering, and recommendation, to search for similar high-dimensional vectors. As the amount of vector data grows continuously, it becomes important to support updates...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-10
Hauptverfasser:	Xu, Yuming, Liang, Hengyu, Li, Jin, Xu, Shuotao, Chen, Qi, Zhang, Qianxi, Cheng, Li, Yang, Ziyue, Yang, Fan, Yang, Yuqing, Cheng, Peng, Mao, Yang
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Information Retrieval Information retrieval Searching
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Xu, Yuming Liang, Hengyu Li, Jin Xu, Shuotao Chen, Qi Zhang, Qianxi Cheng, Li Yang, Ziyue Yang, Fan Yang, Yuqing Cheng, Peng Mao, Yang
description	Approximate Nearest Neighbor Search (ANNS) is now widely used in various applications, ranging from information retrieval, question answering, and recommendation, to search for similar high-dimensional vectors. As the amount of vector data grows continuously, it becomes important to support updates to vector index, the enabling technique that allows for efficient and accurate ANNS on vectors. Because of the curse of high dimensionality, it is often costly to identify the right neighbors of a single new vector, a necessary process for index update. To amortize update costs, existing systems maintain a secondary index to accumulate updates, which are merged by the main index by global rebuilding the entire index periodically. However, this approach has high fluctuations of search latency and accuracy, not even to mention that it requires substantial resources and is extremely time-consuming for rebuilds. We introduce SPFresh, a system that supports in-place vector updates. At the heart of SPFresh is LIRE, a lightweight incremental rebalancing protocol to split vector partitions and reassign vectors in the nearby partitions to adapt to data distribution shift. LIRE achieves low-overhead vector updates by only reassigning vectors at the boundary between partitions, where in a high-quality vector index the amount of such vectors are deemed small. With LIRE, SPFresh provides superior query latency and accuracy to solutions based on global rebuild, with only 1% of DRAM and less than 10% cores needed at the peak compared to the state-of-the-art, in a billion scale vector index with 1% of daily vector update rate.
doi_str_mv	10.48550/arxiv.2410.14452
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2410_14452</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3118915435</sourcerecordid><originalsourceid>FETCH-LOGICAL-a525-9029cbb83885885be9afbafb2c6dae77b59dabffa4f945fd79050b55e1520023</originalsourceid><addsrcrecordid>eNotj19Lw0AQxA9BsNR-AJ8M-Jx6_7bJ-abVaqFgIepr2Lvs0ZQ0qZdU9Nt7tsLALMOwzI-xK8GnOgfgtxi-66-p1DEQWoM8YyOplEhzLeUFm_T9lnMuZ5kEUCP2WKwXgfrNXbJsXaAdtQM28U7XDTpK3vcVDpT4LiQPddPUXZsWDhtKPsgNMSwIg9tcsnOPTU-Tfx-zYvH0Nn9JV6_Py_n9KkWQkBoujbM2V3kOUZYMehsl3axCyjILpkLrPWpvNPgqMxy4BSABMi5WY3Z9-nokLPeh3mH4Kf9IyyNpbNycGvvQfR6oH8ptdwhtnFQqIXIjQCtQv3MwVUQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3118915435</pqid></control><display><type>article</type><title>SPFresh: Incremental In-Place Update for Billion-Scale Vector Search</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Xu, Yuming ; Liang, Hengyu ; Li, Jin ; Xu, Shuotao ; Chen, Qi ; Zhang, Qianxi ; Cheng, Li ; Yang, Ziyue ; Yang, Fan ; Yang, Yuqing ; Cheng, Peng ; Mao, Yang</creator><creatorcontrib>Xu, Yuming ; Liang, Hengyu ; Li, Jin ; Xu, Shuotao ; Chen, Qi ; Zhang, Qianxi ; Cheng, Li ; Yang, Ziyue ; Yang, Fan ; Yang, Yuqing ; Cheng, Peng ; Mao, Yang</creatorcontrib><description>Approximate Nearest Neighbor Search (ANNS) is now widely used in various applications, ranging from information retrieval, question answering, and recommendation, to search for similar high-dimensional vectors. As the amount of vector data grows continuously, it becomes important to support updates to vector index, the enabling technique that allows for efficient and accurate ANNS on vectors. Because of the curse of high dimensionality, it is often costly to identify the right neighbors of a single new vector, a necessary process for index update. To amortize update costs, existing systems maintain a secondary index to accumulate updates, which are merged by the main index by global rebuilding the entire index periodically. However, this approach has high fluctuations of search latency and accuracy, not even to mention that it requires substantial resources and is extremely time-consuming for rebuilds. We introduce SPFresh, a system that supports in-place vector updates. At the heart of SPFresh is LIRE, a lightweight incremental rebalancing protocol to split vector partitions and reassign vectors in the nearby partitions to adapt to data distribution shift. LIRE achieves low-overhead vector updates by only reassigning vectors at the boundary between partitions, where in a high-quality vector index the amount of such vectors are deemed small. With LIRE, SPFresh provides superior query latency and accuracy to solutions based on global rebuild, with only 1% of DRAM and less than 10% cores needed at the peak compared to the state-of-the-art, in a billion scale vector index with 1% of daily vector update rate.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2410.14452</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Information Retrieval ; Information retrieval ; Searching</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27904</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.14452$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1145/3600006.3613166$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Xu, Yuming</creatorcontrib><creatorcontrib>Liang, Hengyu</creatorcontrib><creatorcontrib>Li, Jin</creatorcontrib><creatorcontrib>Xu, Shuotao</creatorcontrib><creatorcontrib>Chen, Qi</creatorcontrib><creatorcontrib>Zhang, Qianxi</creatorcontrib><creatorcontrib>Cheng, Li</creatorcontrib><creatorcontrib>Yang, Ziyue</creatorcontrib><creatorcontrib>Yang, Fan</creatorcontrib><creatorcontrib>Yang, Yuqing</creatorcontrib><creatorcontrib>Cheng, Peng</creatorcontrib><creatorcontrib>Mao, Yang</creatorcontrib><title>SPFresh: Incremental In-Place Update for Billion-Scale Vector Search</title><title>arXiv.org</title><description>Approximate Nearest Neighbor Search (ANNS) is now widely used in various applications, ranging from information retrieval, question answering, and recommendation, to search for similar high-dimensional vectors. As the amount of vector data grows continuously, it becomes important to support updates to vector index, the enabling technique that allows for efficient and accurate ANNS on vectors. Because of the curse of high dimensionality, it is often costly to identify the right neighbors of a single new vector, a necessary process for index update. To amortize update costs, existing systems maintain a secondary index to accumulate updates, which are merged by the main index by global rebuilding the entire index periodically. However, this approach has high fluctuations of search latency and accuracy, not even to mention that it requires substantial resources and is extremely time-consuming for rebuilds. We introduce SPFresh, a system that supports in-place vector updates. At the heart of SPFresh is LIRE, a lightweight incremental rebalancing protocol to split vector partitions and reassign vectors in the nearby partitions to adapt to data distribution shift. LIRE achieves low-overhead vector updates by only reassigning vectors at the boundary between partitions, where in a high-quality vector index the amount of such vectors are deemed small. With LIRE, SPFresh provides superior query latency and accuracy to solutions based on global rebuild, with only 1% of DRAM and less than 10% cores needed at the peak compared to the state-of-the-art, in a billion scale vector index with 1% of daily vector update rate.</description><subject>Computer Science - Information Retrieval</subject><subject>Information retrieval</subject><subject>Searching</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotj19Lw0AQxA9BsNR-AJ8M-Jx6_7bJ-abVaqFgIepr2Lvs0ZQ0qZdU9Nt7tsLALMOwzI-xK8GnOgfgtxi-66-p1DEQWoM8YyOplEhzLeUFm_T9lnMuZ5kEUCP2WKwXgfrNXbJsXaAdtQM28U7XDTpK3vcVDpT4LiQPddPUXZsWDhtKPsgNMSwIg9tcsnOPTU-Tfx-zYvH0Nn9JV6_Py_n9KkWQkBoujbM2V3kOUZYMehsl3axCyjILpkLrPWpvNPgqMxy4BSABMi5WY3Z9-nokLPeh3mH4Kf9IyyNpbNycGvvQfR6oH8ptdwhtnFQqIXIjQCtQv3MwVUQ</recordid><startdate>20241018</startdate><enddate>20241018</enddate><creator>Xu, Yuming</creator><creator>Liang, Hengyu</creator><creator>Li, Jin</creator><creator>Xu, Shuotao</creator><creator>Chen, Qi</creator><creator>Zhang, Qianxi</creator><creator>Cheng, Li</creator><creator>Yang, Ziyue</creator><creator>Yang, Fan</creator><creator>Yang, Yuqing</creator><creator>Cheng, Peng</creator><creator>Mao, Yang</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241018</creationdate><title>SPFresh: Incremental In-Place Update for Billion-Scale Vector Search</title><author>Xu, Yuming ; Liang, Hengyu ; Li, Jin ; Xu, Shuotao ; Chen, Qi ; Zhang, Qianxi ; Cheng, Li ; Yang, Ziyue ; Yang, Fan ; Yang, Yuqing ; Cheng, Peng ; Mao, Yang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a525-9029cbb83885885be9afbafb2c6dae77b59dabffa4f945fd79050b55e1520023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Information Retrieval</topic><topic>Information retrieval</topic><topic>Searching</topic><toplevel>online_resources</toplevel><creatorcontrib>Xu, Yuming</creatorcontrib><creatorcontrib>Liang, Hengyu</creatorcontrib><creatorcontrib>Li, Jin</creatorcontrib><creatorcontrib>Xu, Shuotao</creatorcontrib><creatorcontrib>Chen, Qi</creatorcontrib><creatorcontrib>Zhang, Qianxi</creatorcontrib><creatorcontrib>Cheng, Li</creatorcontrib><creatorcontrib>Yang, Ziyue</creatorcontrib><creatorcontrib>Yang, Fan</creatorcontrib><creatorcontrib>Yang, Yuqing</creatorcontrib><creatorcontrib>Cheng, Peng</creatorcontrib><creatorcontrib>Mao, Yang</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xu, Yuming</au><au>Liang, Hengyu</au><au>Li, Jin</au><au>Xu, Shuotao</au><au>Chen, Qi</au><au>Zhang, Qianxi</au><au>Cheng, Li</au><au>Yang, Ziyue</au><au>Yang, Fan</au><au>Yang, Yuqing</au><au>Cheng, Peng</au><au>Mao, Yang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SPFresh: Incremental In-Place Update for Billion-Scale Vector Search</atitle><jtitle>arXiv.org</jtitle><date>2024-10-18</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Approximate Nearest Neighbor Search (ANNS) is now widely used in various applications, ranging from information retrieval, question answering, and recommendation, to search for similar high-dimensional vectors. As the amount of vector data grows continuously, it becomes important to support updates to vector index, the enabling technique that allows for efficient and accurate ANNS on vectors. Because of the curse of high dimensionality, it is often costly to identify the right neighbors of a single new vector, a necessary process for index update. To amortize update costs, existing systems maintain a secondary index to accumulate updates, which are merged by the main index by global rebuilding the entire index periodically. However, this approach has high fluctuations of search latency and accuracy, not even to mention that it requires substantial resources and is extremely time-consuming for rebuilds. We introduce SPFresh, a system that supports in-place vector updates. At the heart of SPFresh is LIRE, a lightweight incremental rebalancing protocol to split vector partitions and reassign vectors in the nearby partitions to adapt to data distribution shift. LIRE achieves low-overhead vector updates by only reassigning vectors at the boundary between partitions, where in a high-quality vector index the amount of such vectors are deemed small. With LIRE, SPFresh provides superior query latency and accuracy to solutions based on global rebuild, with only 1% of DRAM and less than 10% cores needed at the peak compared to the state-of-the-art, in a billion scale vector index with 1% of daily vector update rate.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2410.14452</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-10
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2410_14452
source	arXiv.org; Free E- Journals
subjects	Computer Science - Information Retrieval Information retrieval Searching
title	SPFresh: Incremental In-Place Update for Billion-Scale Vector Search
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T17%3A41%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SPFresh:%20Incremental%20In-Place%20Update%20for%20Billion-Scale%20Vector%20Search&rft.jtitle=arXiv.org&rft.au=Xu,%20Yuming&rft.date=2024-10-18&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2410.14452&rft_dat=%3Cproquest_arxiv%3E3118915435%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3118915435&rft_id=info:pmid/&rfr_iscdi=true