Co-design Hardware and Algorithm for Vector Search

Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query text...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Jiang, Wenqi, Li, Shigang, Zhu, Yu, Licht, Johannes de Fine, He, Zhenhao, Shi, Runbin, Renggli, Cedric, Zhang, Shuai, Rekatsinas, Theodoros, Hoefler, Torsten, Alonso, Gustavo
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Databases Computer Science - Information Retrieval Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Jiang, Wenqi Li, Shigang Zhu, Yu Licht, Johannes de Fine He, Zhenhao Shi, Runbin Renggli, Cedric Zhang, Shuai Rekatsinas, Theodoros Hoefler, Torsten Alonso, Gustavo
description	Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.
doi_str_mv	10.48550/arxiv.2306.11182
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_11182</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_11182</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-f27bfca678078d5a2f384cea7722e3ec97dec509bbcb9f0597ac73d2573b4db53</originalsourceid><addsrcrecordid>eNotzrFuwjAUhWEvDAh4ACb8Akmd65jrjCiCUgmJAcQaXdvXEAkIMqiUty-lTP-Zjj4hxoXKS2uM-qD0037noNU0L4rCQl9A3WWBr-3-LJeUwp0SSzoHOTvuu9TeDicZuyR37G_PbJiSPwxFL9LxyqN3B2K7mG_rZbZaf37Vs1VGU4QsArron9MqtMEQRG1Lz4QIwJp9hYG9UZVz3lVRmQrJow5gULsyOKMHYvJ_-0I3l9SeKD2aP3zzwutfjYI-rA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Co-design Hardware and Algorithm for Vector Search</title><source>arXiv.org</source><creator>Jiang, Wenqi ; Li, Shigang ; Zhu, Yu ; Licht, Johannes de Fine ; He, Zhenhao ; Shi, Runbin ; Renggli, Cedric ; Zhang, Shuai ; Rekatsinas, Theodoros ; Hoefler, Torsten ; Alonso, Gustavo</creator><creatorcontrib>Jiang, Wenqi ; Li, Shigang ; Zhu, Yu ; Licht, Johannes de Fine ; He, Zhenhao ; Shi, Runbin ; Renggli, Cedric ; Zhang, Shuai ; Rekatsinas, Theodoros ; Hoefler, Torsten ; Alonso, Gustavo</creatorcontrib><description>Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.</description><identifier>DOI: 10.48550/arxiv.2306.11182</identifier><language>eng</language><subject>Computer Science - Databases ; Computer Science - Information Retrieval ; Computer Science - Learning</subject><creationdate>2023-06</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.11182$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.11182$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Wenqi</creatorcontrib><creatorcontrib>Li, Shigang</creatorcontrib><creatorcontrib>Zhu, Yu</creatorcontrib><creatorcontrib>Licht, Johannes de Fine</creatorcontrib><creatorcontrib>He, Zhenhao</creatorcontrib><creatorcontrib>Shi, Runbin</creatorcontrib><creatorcontrib>Renggli, Cedric</creatorcontrib><creatorcontrib>Zhang, Shuai</creatorcontrib><creatorcontrib>Rekatsinas, Theodoros</creatorcontrib><creatorcontrib>Hoefler, Torsten</creatorcontrib><creatorcontrib>Alonso, Gustavo</creatorcontrib><title>Co-design Hardware and Algorithm for Vector Search</title><description>Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.</description><subject>Computer Science - Databases</subject><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrFuwjAUhWEvDAh4ACb8Akmd65jrjCiCUgmJAcQaXdvXEAkIMqiUty-lTP-Zjj4hxoXKS2uM-qD0037noNU0L4rCQl9A3WWBr-3-LJeUwp0SSzoHOTvuu9TeDicZuyR37G_PbJiSPwxFL9LxyqN3B2K7mG_rZbZaf37Vs1VGU4QsArron9MqtMEQRG1Lz4QIwJp9hYG9UZVz3lVRmQrJow5gULsyOKMHYvJ_-0I3l9SeKD2aP3zzwutfjYI-rA</recordid><startdate>20230619</startdate><enddate>20230619</enddate><creator>Jiang, Wenqi</creator><creator>Li, Shigang</creator><creator>Zhu, Yu</creator><creator>Licht, Johannes de Fine</creator><creator>He, Zhenhao</creator><creator>Shi, Runbin</creator><creator>Renggli, Cedric</creator><creator>Zhang, Shuai</creator><creator>Rekatsinas, Theodoros</creator><creator>Hoefler, Torsten</creator><creator>Alonso, Gustavo</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230619</creationdate><title>Co-design Hardware and Algorithm for Vector Search</title><author>Jiang, Wenqi ; Li, Shigang ; Zhu, Yu ; Licht, Johannes de Fine ; He, Zhenhao ; Shi, Runbin ; Renggli, Cedric ; Zhang, Shuai ; Rekatsinas, Theodoros ; Hoefler, Torsten ; Alonso, Gustavo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-f27bfca678078d5a2f384cea7722e3ec97dec509bbcb9f0597ac73d2573b4db53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Databases</topic><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Wenqi</creatorcontrib><creatorcontrib>Li, Shigang</creatorcontrib><creatorcontrib>Zhu, Yu</creatorcontrib><creatorcontrib>Licht, Johannes de Fine</creatorcontrib><creatorcontrib>He, Zhenhao</creatorcontrib><creatorcontrib>Shi, Runbin</creatorcontrib><creatorcontrib>Renggli, Cedric</creatorcontrib><creatorcontrib>Zhang, Shuai</creatorcontrib><creatorcontrib>Rekatsinas, Theodoros</creatorcontrib><creatorcontrib>Hoefler, Torsten</creatorcontrib><creatorcontrib>Alonso, Gustavo</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Wenqi</au><au>Li, Shigang</au><au>Zhu, Yu</au><au>Licht, Johannes de Fine</au><au>He, Zhenhao</au><au>Shi, Runbin</au><au>Renggli, Cedric</au><au>Zhang, Shuai</au><au>Rekatsinas, Theodoros</au><au>Hoefler, Torsten</au><au>Alonso, Gustavo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Co-design Hardware and Algorithm for Vector Search</atitle><date>2023-06-19</date><risdate>2023</risdate><abstract>Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.</abstract><doi>10.48550/arxiv.2306.11182</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2306.11182
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2306_11182
source	arXiv.org
subjects	Computer Science - Databases Computer Science - Information Retrieval Computer Science - Learning
title	Co-design Hardware and Algorithm for Vector Search
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T18%3A05%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Co-design%20Hardware%20and%20Algorithm%20for%20Vector%20Search&rft.au=Jiang,%20Wenqi&rft.date=2023-06-19&rft_id=info:doi/10.48550/arxiv.2306.11182&rft_dat=%3Carxiv_GOX%3E2306_11182%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true