Co-design Hardware and Algorithm for Vector Search

Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query text...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Jiang, Wenqi, Li, Shigang, Zhu, Yu, Licht, Johannes de Fine, He, Zhenhao, Shi, Runbin, Renggli, Cedric, Zhang, Shuai, Rekatsinas, Theodoros, Hoefler, Torsten, Alonso, Gustavo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Jiang, Wenqi
Li, Shigang
Zhu, Yu
Licht, Johannes de Fine
He, Zhenhao
Shi, Runbin
Renggli, Cedric
Zhang, Shuai
Rekatsinas, Theodoros
Hoefler, Torsten
Alonso, Gustavo
description Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.
doi_str_mv 10.48550/arxiv.2306.11182
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_11182</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_11182</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-f27bfca678078d5a2f384cea7722e3ec97dec509bbcb9f0597ac73d2573b4db53</originalsourceid><addsrcrecordid>eNotzrFuwjAUhWEvDAh4ACb8Akmd65jrjCiCUgmJAcQaXdvXEAkIMqiUty-lTP-Zjj4hxoXKS2uM-qD0037noNU0L4rCQl9A3WWBr-3-LJeUwp0SSzoHOTvuu9TeDicZuyR37G_PbJiSPwxFL9LxyqN3B2K7mG_rZbZaf37Vs1VGU4QsArron9MqtMEQRG1Lz4QIwJp9hYG9UZVz3lVRmQrJow5gULsyOKMHYvJ_-0I3l9SeKD2aP3zzwutfjYI-rA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Co-design Hardware and Algorithm for Vector Search</title><source>arXiv.org</source><creator>Jiang, Wenqi ; Li, Shigang ; Zhu, Yu ; Licht, Johannes de Fine ; He, Zhenhao ; Shi, Runbin ; Renggli, Cedric ; Zhang, Shuai ; Rekatsinas, Theodoros ; Hoefler, Torsten ; Alonso, Gustavo</creator><creatorcontrib>Jiang, Wenqi ; Li, Shigang ; Zhu, Yu ; Licht, Johannes de Fine ; He, Zhenhao ; Shi, Runbin ; Renggli, Cedric ; Zhang, Shuai ; Rekatsinas, Theodoros ; Hoefler, Torsten ; Alonso, Gustavo</creatorcontrib><description>Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.</description><identifier>DOI: 10.48550/arxiv.2306.11182</identifier><language>eng</language><subject>Computer Science - Databases ; Computer Science - Information Retrieval ; Computer Science - Learning</subject><creationdate>2023-06</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.11182$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.11182$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Wenqi</creatorcontrib><creatorcontrib>Li, Shigang</creatorcontrib><creatorcontrib>Zhu, Yu</creatorcontrib><creatorcontrib>Licht, Johannes de Fine</creatorcontrib><creatorcontrib>He, Zhenhao</creatorcontrib><creatorcontrib>Shi, Runbin</creatorcontrib><creatorcontrib>Renggli, Cedric</creatorcontrib><creatorcontrib>Zhang, Shuai</creatorcontrib><creatorcontrib>Rekatsinas, Theodoros</creatorcontrib><creatorcontrib>Hoefler, Torsten</creatorcontrib><creatorcontrib>Alonso, Gustavo</creatorcontrib><title>Co-design Hardware and Algorithm for Vector Search</title><description>Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.</description><subject>Computer Science - Databases</subject><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrFuwjAUhWEvDAh4ACb8Akmd65jrjCiCUgmJAcQaXdvXEAkIMqiUty-lTP-Zjj4hxoXKS2uM-qD0037noNU0L4rCQl9A3WWBr-3-LJeUwp0SSzoHOTvuu9TeDicZuyR37G_PbJiSPwxFL9LxyqN3B2K7mG_rZbZaf37Vs1VGU4QsArron9MqtMEQRG1Lz4QIwJp9hYG9UZVz3lVRmQrJow5gULsyOKMHYvJ_-0I3l9SeKD2aP3zzwutfjYI-rA</recordid><startdate>20230619</startdate><enddate>20230619</enddate><creator>Jiang, Wenqi</creator><creator>Li, Shigang</creator><creator>Zhu, Yu</creator><creator>Licht, Johannes de Fine</creator><creator>He, Zhenhao</creator><creator>Shi, Runbin</creator><creator>Renggli, Cedric</creator><creator>Zhang, Shuai</creator><creator>Rekatsinas, Theodoros</creator><creator>Hoefler, Torsten</creator><creator>Alonso, Gustavo</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230619</creationdate><title>Co-design Hardware and Algorithm for Vector Search</title><author>Jiang, Wenqi ; Li, Shigang ; Zhu, Yu ; Licht, Johannes de Fine ; He, Zhenhao ; Shi, Runbin ; Renggli, Cedric ; Zhang, Shuai ; Rekatsinas, Theodoros ; Hoefler, Torsten ; Alonso, Gustavo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-f27bfca678078d5a2f384cea7722e3ec97dec509bbcb9f0597ac73d2573b4db53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Databases</topic><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Wenqi</creatorcontrib><creatorcontrib>Li, Shigang</creatorcontrib><creatorcontrib>Zhu, Yu</creatorcontrib><creatorcontrib>Licht, Johannes de Fine</creatorcontrib><creatorcontrib>He, Zhenhao</creatorcontrib><creatorcontrib>Shi, Runbin</creatorcontrib><creatorcontrib>Renggli, Cedric</creatorcontrib><creatorcontrib>Zhang, Shuai</creatorcontrib><creatorcontrib>Rekatsinas, Theodoros</creatorcontrib><creatorcontrib>Hoefler, Torsten</creatorcontrib><creatorcontrib>Alonso, Gustavo</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Wenqi</au><au>Li, Shigang</au><au>Zhu, Yu</au><au>Licht, Johannes de Fine</au><au>He, Zhenhao</au><au>Shi, Runbin</au><au>Renggli, Cedric</au><au>Zhang, Shuai</au><au>Rekatsinas, Theodoros</au><au>Hoefler, Torsten</au><au>Alonso, Gustavo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Co-design Hardware and Algorithm for Vector Search</atitle><date>2023-06-19</date><risdate>2023</risdate><abstract>Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.</abstract><doi>10.48550/arxiv.2306.11182</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2306.11182
ispartof
issn
language eng
recordid cdi_arxiv_primary_2306_11182
source arXiv.org
subjects Computer Science - Databases
Computer Science - Information Retrieval
Computer Science - Learning
title Co-design Hardware and Algorithm for Vector Search
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T18%3A05%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Co-design%20Hardware%20and%20Algorithm%20for%20Vector%20Search&rft.au=Jiang,%20Wenqi&rft.date=2023-06-19&rft_id=info:doi/10.48550/arxiv.2306.11182&rft_dat=%3Carxiv_GOX%3E2306_11182%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true