Exploiting HBM on FPGAs for Data Processing

Field Programmable Gate Arrays (FPGAs) are increasingly being used in data centers and the cloud due to their potential to accelerate certain workloads as well as for their architectural flexibility, since they can be used as accelerators, smart-NICs, or stand-alone processors. To meet the challenge...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on reconfigurable technology and systems 2022-12, Vol.15 (4), p.1-27, Article 36
Hauptverfasser: Shi, Runbin, Kara, Kaan, Hagleitner, Christoph, Diamantopoulos, Dionysios, Syrivelis, Dimitris, Alonso, Gustavo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Field Programmable Gate Arrays (FPGAs) are increasingly being used in data centers and the cloud due to their potential to accelerate certain workloads as well as for their architectural flexibility, since they can be used as accelerators, smart-NICs, or stand-alone processors. To meet the challenges posed by these new use cases, FPGAs are quickly evolving in terms of their capabilities and organization. The utilization of High Bandwidth Memory (HBM) in FPGA devices is one recent example of such a trend. In this article, we study the potential of FPGAs equipped with HBM from a data analytics perspective. We consider three workloads common in analytics-oriented databases and implement them on an FPGA showing in which cases they benefit from HBM: range selection, hash join, and stochastic gradient descent for linear model training. We integrate our designs into a columnar database (MonetDB) and show the trade-offs arising from the integration related to data movement and partitioning. We consider two possible configurations of the HBM, using a single and a dual clock version design. With the right design, FPGA+HBM-based solutions are able to surpass the highest performance provided by either a two-socket POWER91 system or a 14-core Xeon2 E5 by up to 5.9× (range selection), 18.3× (hash join), and 6.1× (SGD).
ISSN:1936-7406
1936-7414
DOI:10.1145/3491238