Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication

Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM)...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Song, Linghao, Chi, Yuze, Guo, Licheng, Cong, Jason
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Song, Linghao
Chi, Yuze
Guo, Licheng
Cong, Jason
description Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and 6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s (30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at https://github.com/UCLA-VAST/Serpens.
doi_str_mv 10.48550/arxiv.2111.12555
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2111_12555</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2111_12555</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-6cfe0d9f85af44868cb39603975fb0b756bf93809aecd8e37475a17acb0ae0353</originalsourceid><addsrcrecordid>eNotj8FOwzAQRH3hgAofwAn_QIJdZ2OHW6igRWoEUiuu0cZZU6M0iZwU2r8nLRxGoxmNRnqM3UkRJwZAPGA4-u94LqWM5RwArtnXhkJP7fDIc77ynzv-hG394-txxwvad-E0FQPVPLeWGgo4doG7SUtqp9RE74fQdwPxTY9hsgLH4I_RB9nzsDg0o-8bb3H0XXvDrhw2A93--4xtX563i1W0flu-LvJ1hKmGKLWORJ05A-iSxKTGVipLhco0uEpUGtLKZcqIDMnWhpRONKDUaCuBJBSoGbv_u73Aln3wewyn8gxdXqDVL0rkUyY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication</title><source>arXiv.org</source><creator>Song, Linghao ; Chi, Yuze ; Guo, Licheng ; Cong, Jason</creator><creatorcontrib>Song, Linghao ; Chi, Yuze ; Guo, Licheng ; Cong, Jason</creatorcontrib><description>Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and 6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s (30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at https://github.com/UCLA-VAST/Serpens.</description><identifier>DOI: 10.48550/arxiv.2111.12555</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Hardware Architecture</subject><creationdate>2021-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2111.12555$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2111.12555$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Song, Linghao</creatorcontrib><creatorcontrib>Chi, Yuze</creatorcontrib><creatorcontrib>Guo, Licheng</creatorcontrib><creatorcontrib>Cong, Jason</creatorcontrib><title>Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication</title><description>Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and 6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s (30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at https://github.com/UCLA-VAST/Serpens.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Hardware Architecture</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FOwzAQRH3hgAofwAn_QIJdZ2OHW6igRWoEUiuu0cZZU6M0iZwU2r8nLRxGoxmNRnqM3UkRJwZAPGA4-u94LqWM5RwArtnXhkJP7fDIc77ynzv-hG394-txxwvad-E0FQPVPLeWGgo4doG7SUtqp9RE74fQdwPxTY9hsgLH4I_RB9nzsDg0o-8bb3H0XXvDrhw2A93--4xtX563i1W0flu-LvJ1hKmGKLWORJ05A-iSxKTGVipLhco0uEpUGtLKZcqIDMnWhpRONKDUaCuBJBSoGbv_u73Aln3wewyn8gxdXqDVL0rkUyY</recordid><startdate>20211124</startdate><enddate>20211124</enddate><creator>Song, Linghao</creator><creator>Chi, Yuze</creator><creator>Guo, Licheng</creator><creator>Cong, Jason</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20211124</creationdate><title>Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication</title><author>Song, Linghao ; Chi, Yuze ; Guo, Licheng ; Cong, Jason</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-6cfe0d9f85af44868cb39603975fb0b756bf93809aecd8e37475a17acb0ae0353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Hardware Architecture</topic><toplevel>online_resources</toplevel><creatorcontrib>Song, Linghao</creatorcontrib><creatorcontrib>Chi, Yuze</creatorcontrib><creatorcontrib>Guo, Licheng</creatorcontrib><creatorcontrib>Cong, Jason</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Song, Linghao</au><au>Chi, Yuze</au><au>Guo, Licheng</au><au>Cong, Jason</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication</atitle><date>2021-11-24</date><risdate>2021</risdate><abstract>Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and 6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s (30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at https://github.com/UCLA-VAST/Serpens.</abstract><doi>10.48550/arxiv.2111.12555</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2111.12555
ispartof
issn
language eng
recordid cdi_arxiv_primary_2111_12555
source arXiv.org
subjects Computer Science - Distributed, Parallel, and Cluster Computing
Computer Science - Hardware Architecture
title Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T12%3A59%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Serpens:%20A%20High%20Bandwidth%20Memory%20Based%20Accelerator%20for%20General-Purpose%20Sparse%20Matrix-Vector%20Multiplication&rft.au=Song,%20Linghao&rft.date=2021-11-24&rft_id=info:doi/10.48550/arxiv.2111.12555&rft_dat=%3Carxiv_GOX%3E2111_12555%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true