Exploiting HBM on FPGAs for Data Processing

Field Programmable Gate Arrays (FPGAs) are increasingly being used in data centers and the cloud due to their potential to accelerate certain workloads as well as for their architectural flexibility, since they can be used as accelerators, smart-NICs, or stand-alone processors. To meet the challenge...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on reconfigurable technology and systems 2022-12, Vol.15 (4), p.1-27, Article 36
Hauptverfasser: Shi, Runbin, Kara, Kaan, Hagleitner, Christoph, Diamantopoulos, Dionysios, Syrivelis, Dimitris, Alonso, Gustavo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 27
container_issue 4
container_start_page 1
container_title ACM transactions on reconfigurable technology and systems
container_volume 15
creator Shi, Runbin
Kara, Kaan
Hagleitner, Christoph
Diamantopoulos, Dionysios
Syrivelis, Dimitris
Alonso, Gustavo
description Field Programmable Gate Arrays (FPGAs) are increasingly being used in data centers and the cloud due to their potential to accelerate certain workloads as well as for their architectural flexibility, since they can be used as accelerators, smart-NICs, or stand-alone processors. To meet the challenges posed by these new use cases, FPGAs are quickly evolving in terms of their capabilities and organization. The utilization of High Bandwidth Memory (HBM) in FPGA devices is one recent example of such a trend. In this article, we study the potential of FPGAs equipped with HBM from a data analytics perspective. We consider three workloads common in analytics-oriented databases and implement them on an FPGA showing in which cases they benefit from HBM: range selection, hash join, and stochastic gradient descent for linear model training. We integrate our designs into a columnar database (MonetDB) and show the trade-offs arising from the integration related to data movement and partitioning. We consider two possible configurations of the HBM, using a single and a dual clock version design. With the right design, FPGA+HBM-based solutions are able to surpass the highest performance provided by either a two-socket POWER91 system or a 14-core Xeon2 E5 by up to 5.9× (range selection), 18.3× (hash join), and 6.1× (SGD).
doi_str_mv 10.1145/3491238
format Article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3491238</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3491238</sourcerecordid><originalsourceid>FETCH-LOGICAL-a244t-c6befec0fa5fab7784f9a3ff57804751e43259b68b29d473b6b33356e996e7c93</originalsourceid><addsrcrecordid>eNo9j0tLAzEUhYMoWKu4d5WdC5mazL15LWvtQ6jYRbsekpjISNuUZBb67630sToHzseBj5B7zgaco3gGNLwGfUF63ICsFHK8PHcmr8lNKd-MSZAae-Rp_LNbp7Zrt1909vJO05ZOFtNhoTFl-mo7Sxc5-VDKHrglV9GuS7g7Zp-sJuPlaFbNP6Zvo-G8sjViV3npQgyeRSuidUppjMZCjEJphkrwgFAL46R2tflEBU46ABAyGCOD8gb65PHw63MqJYfY7HK7sfm34az5d2yOjnvy4UBavzlDp_EPUZpJag</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Exploiting HBM on FPGAs for Data Processing</title><source>ACM Digital Library Complete</source><creator>Shi, Runbin ; Kara, Kaan ; Hagleitner, Christoph ; Diamantopoulos, Dionysios ; Syrivelis, Dimitris ; Alonso, Gustavo</creator><creatorcontrib>Shi, Runbin ; Kara, Kaan ; Hagleitner, Christoph ; Diamantopoulos, Dionysios ; Syrivelis, Dimitris ; Alonso, Gustavo</creatorcontrib><description>Field Programmable Gate Arrays (FPGAs) are increasingly being used in data centers and the cloud due to their potential to accelerate certain workloads as well as for their architectural flexibility, since they can be used as accelerators, smart-NICs, or stand-alone processors. To meet the challenges posed by these new use cases, FPGAs are quickly evolving in terms of their capabilities and organization. The utilization of High Bandwidth Memory (HBM) in FPGA devices is one recent example of such a trend. In this article, we study the potential of FPGAs equipped with HBM from a data analytics perspective. We consider three workloads common in analytics-oriented databases and implement them on an FPGA showing in which cases they benefit from HBM: range selection, hash join, and stochastic gradient descent for linear model training. We integrate our designs into a columnar database (MonetDB) and show the trade-offs arising from the integration related to data movement and partitioning. We consider two possible configurations of the HBM, using a single and a dual clock version design. With the right design, FPGA+HBM-based solutions are able to surpass the highest performance provided by either a two-socket POWER91 system or a 14-core Xeon2 E5 by up to 5.9× (range selection), 18.3× (hash join), and 6.1× (SGD).</description><identifier>ISSN: 1936-7406</identifier><identifier>EISSN: 1936-7414</identifier><identifier>DOI: 10.1145/3491238</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Analysis and design of emerging devices and systems ; Computer systems organization ; Database management system engines ; Hardware ; Hardware accelerators ; Heterogeneous (hybrid) systems ; Information systems ; Reconfigurable computing</subject><ispartof>ACM transactions on reconfigurable technology and systems, 2022-12, Vol.15 (4), p.1-27, Article 36</ispartof><rights>Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a244t-c6befec0fa5fab7784f9a3ff57804751e43259b68b29d473b6b33356e996e7c93</citedby><cites>FETCH-LOGICAL-a244t-c6befec0fa5fab7784f9a3ff57804751e43259b68b29d473b6b33356e996e7c93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3491238$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2282,27924,27925,40196,76228</link.rule.ids></links><search><creatorcontrib>Shi, Runbin</creatorcontrib><creatorcontrib>Kara, Kaan</creatorcontrib><creatorcontrib>Hagleitner, Christoph</creatorcontrib><creatorcontrib>Diamantopoulos, Dionysios</creatorcontrib><creatorcontrib>Syrivelis, Dimitris</creatorcontrib><creatorcontrib>Alonso, Gustavo</creatorcontrib><title>Exploiting HBM on FPGAs for Data Processing</title><title>ACM transactions on reconfigurable technology and systems</title><addtitle>ACM TRETS</addtitle><description>Field Programmable Gate Arrays (FPGAs) are increasingly being used in data centers and the cloud due to their potential to accelerate certain workloads as well as for their architectural flexibility, since they can be used as accelerators, smart-NICs, or stand-alone processors. To meet the challenges posed by these new use cases, FPGAs are quickly evolving in terms of their capabilities and organization. The utilization of High Bandwidth Memory (HBM) in FPGA devices is one recent example of such a trend. In this article, we study the potential of FPGAs equipped with HBM from a data analytics perspective. We consider three workloads common in analytics-oriented databases and implement them on an FPGA showing in which cases they benefit from HBM: range selection, hash join, and stochastic gradient descent for linear model training. We integrate our designs into a columnar database (MonetDB) and show the trade-offs arising from the integration related to data movement and partitioning. We consider two possible configurations of the HBM, using a single and a dual clock version design. With the right design, FPGA+HBM-based solutions are able to surpass the highest performance provided by either a two-socket POWER91 system or a 14-core Xeon2 E5 by up to 5.9× (range selection), 18.3× (hash join), and 6.1× (SGD).</description><subject>Analysis and design of emerging devices and systems</subject><subject>Computer systems organization</subject><subject>Database management system engines</subject><subject>Hardware</subject><subject>Hardware accelerators</subject><subject>Heterogeneous (hybrid) systems</subject><subject>Information systems</subject><subject>Reconfigurable computing</subject><issn>1936-7406</issn><issn>1936-7414</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNo9j0tLAzEUhYMoWKu4d5WdC5mazL15LWvtQ6jYRbsekpjISNuUZBb67630sToHzseBj5B7zgaco3gGNLwGfUF63ICsFHK8PHcmr8lNKd-MSZAae-Rp_LNbp7Zrt1909vJO05ZOFtNhoTFl-mo7Sxc5-VDKHrglV9GuS7g7Zp-sJuPlaFbNP6Zvo-G8sjViV3npQgyeRSuidUppjMZCjEJphkrwgFAL46R2tflEBU46ABAyGCOD8gb65PHw63MqJYfY7HK7sfm34az5d2yOjnvy4UBavzlDp_EPUZpJag</recordid><startdate>20221209</startdate><enddate>20221209</enddate><creator>Shi, Runbin</creator><creator>Kara, Kaan</creator><creator>Hagleitner, Christoph</creator><creator>Diamantopoulos, Dionysios</creator><creator>Syrivelis, Dimitris</creator><creator>Alonso, Gustavo</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20221209</creationdate><title>Exploiting HBM on FPGAs for Data Processing</title><author>Shi, Runbin ; Kara, Kaan ; Hagleitner, Christoph ; Diamantopoulos, Dionysios ; Syrivelis, Dimitris ; Alonso, Gustavo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a244t-c6befec0fa5fab7784f9a3ff57804751e43259b68b29d473b6b33356e996e7c93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Analysis and design of emerging devices and systems</topic><topic>Computer systems organization</topic><topic>Database management system engines</topic><topic>Hardware</topic><topic>Hardware accelerators</topic><topic>Heterogeneous (hybrid) systems</topic><topic>Information systems</topic><topic>Reconfigurable computing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shi, Runbin</creatorcontrib><creatorcontrib>Kara, Kaan</creatorcontrib><creatorcontrib>Hagleitner, Christoph</creatorcontrib><creatorcontrib>Diamantopoulos, Dionysios</creatorcontrib><creatorcontrib>Syrivelis, Dimitris</creatorcontrib><creatorcontrib>Alonso, Gustavo</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on reconfigurable technology and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shi, Runbin</au><au>Kara, Kaan</au><au>Hagleitner, Christoph</au><au>Diamantopoulos, Dionysios</au><au>Syrivelis, Dimitris</au><au>Alonso, Gustavo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploiting HBM on FPGAs for Data Processing</atitle><jtitle>ACM transactions on reconfigurable technology and systems</jtitle><stitle>ACM TRETS</stitle><date>2022-12-09</date><risdate>2022</risdate><volume>15</volume><issue>4</issue><spage>1</spage><epage>27</epage><pages>1-27</pages><artnum>36</artnum><issn>1936-7406</issn><eissn>1936-7414</eissn><abstract>Field Programmable Gate Arrays (FPGAs) are increasingly being used in data centers and the cloud due to their potential to accelerate certain workloads as well as for their architectural flexibility, since they can be used as accelerators, smart-NICs, or stand-alone processors. To meet the challenges posed by these new use cases, FPGAs are quickly evolving in terms of their capabilities and organization. The utilization of High Bandwidth Memory (HBM) in FPGA devices is one recent example of such a trend. In this article, we study the potential of FPGAs equipped with HBM from a data analytics perspective. We consider three workloads common in analytics-oriented databases and implement them on an FPGA showing in which cases they benefit from HBM: range selection, hash join, and stochastic gradient descent for linear model training. We integrate our designs into a columnar database (MonetDB) and show the trade-offs arising from the integration related to data movement and partitioning. We consider two possible configurations of the HBM, using a single and a dual clock version design. With the right design, FPGA+HBM-based solutions are able to surpass the highest performance provided by either a two-socket POWER91 system or a 14-core Xeon2 E5 by up to 5.9× (range selection), 18.3× (hash join), and 6.1× (SGD).</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3491238</doi><tpages>27</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1936-7406
ispartof ACM transactions on reconfigurable technology and systems, 2022-12, Vol.15 (4), p.1-27, Article 36
issn 1936-7406
1936-7414
language eng
recordid cdi_crossref_primary_10_1145_3491238
source ACM Digital Library Complete
subjects Analysis and design of emerging devices and systems
Computer systems organization
Database management system engines
Hardware
Hardware accelerators
Heterogeneous (hybrid) systems
Information systems
Reconfigurable computing
title Exploiting HBM on FPGAs for Data Processing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T02%3A24%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploiting%20HBM%20on%20FPGAs%20for%20Data%20Processing&rft.jtitle=ACM%20transactions%20on%20reconfigurable%20technology%20and%20systems&rft.au=Shi,%20Runbin&rft.date=2022-12-09&rft.volume=15&rft.issue=4&rft.spage=1&rft.epage=27&rft.pages=1-27&rft.artnum=36&rft.issn=1936-7406&rft.eissn=1936-7414&rft_id=info:doi/10.1145/3491238&rft_dat=%3Cacm_cross%3E3491238%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true