Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication

Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM)...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Song, Linghao, Chi, Yuze, Guo, Licheng, Cong, Jason
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Hardware Architecture
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Song, Linghao Chi, Yuze Guo, Licheng Cong, Jason
description	Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and 6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s (30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at https://github.com/UCLA-VAST/Serpens.
doi_str_mv	10.48550/arxiv.2111.12555
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2111_12555</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2111_12555</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-6cfe0d9f85af44868cb39603975fb0b756bf93809aecd8e37475a17acb0ae0353</originalsourceid><addsrcrecordid>eNotj8FOwzAQRH3hgAofwAn_QIJdZ2OHW6igRWoEUiuu0cZZU6M0iZwU2r8nLRxGoxmNRnqM3UkRJwZAPGA4-u94LqWM5RwArtnXhkJP7fDIc77ynzv-hG394-txxwvad-E0FQPVPLeWGgo4doG7SUtqp9RE74fQdwPxTY9hsgLH4I_RB9nzsDg0o-8bb3H0XXvDrhw2A93--4xtX563i1W0flu-LvJ1hKmGKLWORJ05A-iSxKTGVipLhco0uEpUGtLKZcqIDMnWhpRONKDUaCuBJBSoGbv_u73Aln3wewyn8gxdXqDVL0rkUyY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication</title><source>arXiv.org</source><creator>Song, Linghao ; Chi, Yuze ; Guo, Licheng ; Cong, Jason</creator><creatorcontrib>Song, Linghao ; Chi, Yuze ; Guo, Licheng ; Cong, Jason</creatorcontrib><description>Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and 6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s (30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at https://github.com/UCLA-VAST/Serpens.</description><identifier>DOI: 10.48550/arxiv.2111.12555</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Hardware Architecture</subject><creationdate>2021-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2111.12555$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2111.12555$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Song, Linghao</creatorcontrib><creatorcontrib>Chi, Yuze</creatorcontrib><creatorcontrib>Guo, Licheng</creatorcontrib><creatorcontrib>Cong, Jason</creatorcontrib><title>Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication</title><description>Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and 6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s (30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at https://github.com/UCLA-VAST/Serpens.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Hardware Architecture</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FOwzAQRH3hgAofwAn_QIJdZ2OHW6igRWoEUiuu0cZZU6M0iZwU2r8nLRxGoxmNRnqM3UkRJwZAPGA4-u94LqWM5RwArtnXhkJP7fDIc77ynzv-hG394-txxwvad-E0FQPVPLeWGgo4doG7SUtqp9RE74fQdwPxTY9hsgLH4I_RB9nzsDg0o-8bb3H0XXvDrhw2A93--4xtX563i1W0flu-LvJ1hKmGKLWORJ05A-iSxKTGVipLhco0uEpUGtLKZcqIDMnWhpRONKDUaCuBJBSoGbv_u73Aln3wewyn8gxdXqDVL0rkUyY</recordid><startdate>20211124</startdate><enddate>20211124</enddate><creator>Song, Linghao</creator><creator>Chi, Yuze</creator><creator>Guo, Licheng</creator><creator>Cong, Jason</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20211124</creationdate><title>Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication</title><author>Song, Linghao ; Chi, Yuze ; Guo, Licheng ; Cong, Jason</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-6cfe0d9f85af44868cb39603975fb0b756bf93809aecd8e37475a17acb0ae0353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Hardware Architecture</topic><toplevel>online_resources</toplevel><creatorcontrib>Song, Linghao</creatorcontrib><creatorcontrib>Chi, Yuze</creatorcontrib><creatorcontrib>Guo, Licheng</creatorcontrib><creatorcontrib>Cong, Jason</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Song, Linghao</au><au>Chi, Yuze</au><au>Guo, Licheng</au><au>Cong, Jason</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication</atitle><date>2021-11-24</date><risdate>2021</risdate><abstract>Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and 6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s (30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at https://github.com/UCLA-VAST/Serpens.</abstract><doi>10.48550/arxiv.2111.12555</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2111.12555
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2111_12555
source	arXiv.org
subjects	Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Hardware Architecture
title	Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T12%3A59%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Serpens:%20A%20High%20Bandwidth%20Memory%20Based%20Accelerator%20for%20General-Purpose%20Sparse%20Matrix-Vector%20Multiplication&rft.au=Song,%20Linghao&rft.date=2021-11-24&rft_id=info:doi/10.48550/arxiv.2111.12555&rft_dat=%3Carxiv_GOX%3E2111_12555%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true