Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication
Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM)...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Song, Linghao Chi, Yuze Guo, Licheng Cong, Jason |
description | Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a
dense vector. SpMV plays a crucial role in many applications, from graph
analytics to deep learning. The random memory accesses of the sparse matrix
make accelerator design challenging. However, high bandwidth memory (HBM) based
FPGAs are a good fit for designing accelerators for SpMV. In this paper, we
present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens
features (1) a general-purpose design, (2) memory-centric processing engines,
and (3) index coalescing to support the efficient processing of arbitrary
SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and
1.76x better in terms of geomean throughput than the latest accelerators
GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse
matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the
energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and
6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively.
After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s
(30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at
https://github.com/UCLA-VAST/Serpens. |
doi_str_mv | 10.48550/arxiv.2111.12555 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2111_12555</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2111_12555</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-6cfe0d9f85af44868cb39603975fb0b756bf93809aecd8e37475a17acb0ae0353</originalsourceid><addsrcrecordid>eNotj8FOwzAQRH3hgAofwAn_QIJdZ2OHW6igRWoEUiuu0cZZU6M0iZwU2r8nLRxGoxmNRnqM3UkRJwZAPGA4-u94LqWM5RwArtnXhkJP7fDIc77ynzv-hG394-txxwvad-E0FQPVPLeWGgo4doG7SUtqp9RE74fQdwPxTY9hsgLH4I_RB9nzsDg0o-8bb3H0XXvDrhw2A93--4xtX563i1W0flu-LvJ1hKmGKLWORJ05A-iSxKTGVipLhco0uEpUGtLKZcqIDMnWhpRONKDUaCuBJBSoGbv_u73Aln3wewyn8gxdXqDVL0rkUyY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication</title><source>arXiv.org</source><creator>Song, Linghao ; Chi, Yuze ; Guo, Licheng ; Cong, Jason</creator><creatorcontrib>Song, Linghao ; Chi, Yuze ; Guo, Licheng ; Cong, Jason</creatorcontrib><description>Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a
dense vector. SpMV plays a crucial role in many applications, from graph
analytics to deep learning. The random memory accesses of the sparse matrix
make accelerator design challenging. However, high bandwidth memory (HBM) based
FPGAs are a good fit for designing accelerators for SpMV. In this paper, we
present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens
features (1) a general-purpose design, (2) memory-centric processing engines,
and (3) index coalescing to support the efficient processing of arbitrary
SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and
1.76x better in terms of geomean throughput than the latest accelerators
GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse
matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the
energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and
6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively.
After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s
(30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at
https://github.com/UCLA-VAST/Serpens.</description><identifier>DOI: 10.48550/arxiv.2111.12555</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Hardware Architecture</subject><creationdate>2021-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2111.12555$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2111.12555$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Song, Linghao</creatorcontrib><creatorcontrib>Chi, Yuze</creatorcontrib><creatorcontrib>Guo, Licheng</creatorcontrib><creatorcontrib>Cong, Jason</creatorcontrib><title>Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication</title><description>Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a
dense vector. SpMV plays a crucial role in many applications, from graph
analytics to deep learning. The random memory accesses of the sparse matrix
make accelerator design challenging. However, high bandwidth memory (HBM) based
FPGAs are a good fit for designing accelerators for SpMV. In this paper, we
present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens
features (1) a general-purpose design, (2) memory-centric processing engines,
and (3) index coalescing to support the efficient processing of arbitrary
SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and
1.76x better in terms of geomean throughput than the latest accelerators
GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse
matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the
energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and
6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively.
After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s
(30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at
https://github.com/UCLA-VAST/Serpens.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Hardware Architecture</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FOwzAQRH3hgAofwAn_QIJdZ2OHW6igRWoEUiuu0cZZU6M0iZwU2r8nLRxGoxmNRnqM3UkRJwZAPGA4-u94LqWM5RwArtnXhkJP7fDIc77ynzv-hG394-txxwvad-E0FQPVPLeWGgo4doG7SUtqp9RE74fQdwPxTY9hsgLH4I_RB9nzsDg0o-8bb3H0XXvDrhw2A93--4xtX563i1W0flu-LvJ1hKmGKLWORJ05A-iSxKTGVipLhco0uEpUGtLKZcqIDMnWhpRONKDUaCuBJBSoGbv_u73Aln3wewyn8gxdXqDVL0rkUyY</recordid><startdate>20211124</startdate><enddate>20211124</enddate><creator>Song, Linghao</creator><creator>Chi, Yuze</creator><creator>Guo, Licheng</creator><creator>Cong, Jason</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20211124</creationdate><title>Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication</title><author>Song, Linghao ; Chi, Yuze ; Guo, Licheng ; Cong, Jason</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-6cfe0d9f85af44868cb39603975fb0b756bf93809aecd8e37475a17acb0ae0353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Hardware Architecture</topic><toplevel>online_resources</toplevel><creatorcontrib>Song, Linghao</creatorcontrib><creatorcontrib>Chi, Yuze</creatorcontrib><creatorcontrib>Guo, Licheng</creatorcontrib><creatorcontrib>Cong, Jason</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Song, Linghao</au><au>Chi, Yuze</au><au>Guo, Licheng</au><au>Cong, Jason</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication</atitle><date>2021-11-24</date><risdate>2021</risdate><abstract>Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a
dense vector. SpMV plays a crucial role in many applications, from graph
analytics to deep learning. The random memory accesses of the sparse matrix
make accelerator design challenging. However, high bandwidth memory (HBM) based
FPGAs are a good fit for designing accelerators for SpMV. In this paper, we
present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens
features (1) a general-purpose design, (2) memory-centric processing engines,
and (3) index coalescing to support the efficient processing of arbitrary
SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and
1.76x better in terms of geomean throughput than the latest accelerators
GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse
matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the
energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and
6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively.
After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s
(30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at
https://github.com/UCLA-VAST/Serpens.</abstract><doi>10.48550/arxiv.2111.12555</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2111.12555 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2111_12555 |
source | arXiv.org |
subjects | Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Hardware Architecture |
title | Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T12%3A59%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Serpens:%20A%20High%20Bandwidth%20Memory%20Based%20Accelerator%20for%20General-Purpose%20Sparse%20Matrix-Vector%20Multiplication&rft.au=Song,%20Linghao&rft.date=2021-11-24&rft_id=info:doi/10.48550/arxiv.2111.12555&rft_dat=%3Carxiv_GOX%3E2111_12555%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |