Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication
Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of applications, including scientific computing, graph processing, and deep learning. Architecting accelerators for SpMM is faced with three challenges - (1) the random memory accessing and unbalanced load in proce...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Song, Linghao Chi, Yuze Sohrabizadeh, Atefeh Choi, Young-kyu Lau, Jason Cong, Jason |
description | Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a
wide range of applications, including scientific computing, graph processing,
and deep learning. Architecting accelerators for SpMM is faced with three
challenges - (1) the random memory accessing and unbalanced load in processing
because of random distribution of elements in sparse matrices, (2) inefficient
data handling of the large matrices which can not be fit on-chip, and (3)
anon-general-purpose accelerator design where one accelerator can only process
a fixed-size problem. In this paper, we present Sextans, an accelerator for
general-purpose SpMM processing. Sextans accelerator features (1) fast random
access using on-chip memory, (2) streaming access to off-chip large matrices,
(3) PE-aware non-zero scheduling for balanced workload with an II=1 pipeline,
and (4) hardware flexibility to enable prototyping the hardware once to support
SpMMs of different size as a general-purpose accelerator. We leverage high
bandwidth memory (HBM) for the efficient accessing of both sparse and dense
matrices. In the evaluation, we present an FPGA prototype Sextans which is
executable on a Xilinx U280 HBM FPGA board and a projected prototype Sextans-P
with higher bandwidth comparable to V100 and more frequency optimization. We
conduct a comprehensive evaluation on 1,400 SpMMs on a wide range of sparse
matrices including 50 matrices from SNAP and 150 from SuiteSparse.
WecompareSextanswith NVIDIA K80 and V100 GPUs.Sextansachieves a 2.50x geomean
speedup over K80 GPU andSextans-Pachieves a 1.14x geomean speedup over V100 GPU
(4.94x over K80). The code is available at
https://github.com/linghaosong/Sextans. |
doi_str_mv | 10.48550/arxiv.2109.11081 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2109_11081</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2109_11081</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-febddb5ed77f43456d81c485dcc428077248d494eb4b0cb4621ee97f80b1cde13</originalsourceid><addsrcrecordid>eNpFj01OwzAYRL1hgQoHYIUv4GAnTuywiwqUSq1AStdE_vmMLKVO5Lgo3J60ILEYjd5mNA-hO0YzLsuSPqg4-68sZ7TOGKOSXaOPFuakwvSIG9ymCOrowydujIEeokpDxG7JBsJCPXk_xXGYALejihOQvUrRz_gJwj_sT33yY--NSn4IN-jKqX6C279eocPL82H9SnZvm-262RFVCUYcaGt1CVYIxwteVlYyszy2xvBcUiFyLi2vOWiuqdG8yhlALZykmhkLrFih-9_Zi2A3Rn9U8bs7i3YX0eIHWsdPbQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication</title><source>arXiv.org</source><creator>Song, Linghao ; Chi, Yuze ; Sohrabizadeh, Atefeh ; Choi, Young-kyu ; Lau, Jason ; Cong, Jason</creator><creatorcontrib>Song, Linghao ; Chi, Yuze ; Sohrabizadeh, Atefeh ; Choi, Young-kyu ; Lau, Jason ; Cong, Jason</creatorcontrib><description>Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a
wide range of applications, including scientific computing, graph processing,
and deep learning. Architecting accelerators for SpMM is faced with three
challenges - (1) the random memory accessing and unbalanced load in processing
because of random distribution of elements in sparse matrices, (2) inefficient
data handling of the large matrices which can not be fit on-chip, and (3)
anon-general-purpose accelerator design where one accelerator can only process
a fixed-size problem. In this paper, we present Sextans, an accelerator for
general-purpose SpMM processing. Sextans accelerator features (1) fast random
access using on-chip memory, (2) streaming access to off-chip large matrices,
(3) PE-aware non-zero scheduling for balanced workload with an II=1 pipeline,
and (4) hardware flexibility to enable prototyping the hardware once to support
SpMMs of different size as a general-purpose accelerator. We leverage high
bandwidth memory (HBM) for the efficient accessing of both sparse and dense
matrices. In the evaluation, we present an FPGA prototype Sextans which is
executable on a Xilinx U280 HBM FPGA board and a projected prototype Sextans-P
with higher bandwidth comparable to V100 and more frequency optimization. We
conduct a comprehensive evaluation on 1,400 SpMMs on a wide range of sparse
matrices including 50 matrices from SNAP and 150 from SuiteSparse.
WecompareSextanswith NVIDIA K80 and V100 GPUs.Sextansachieves a 2.50x geomean
speedup over K80 GPU andSextans-Pachieves a 1.14x geomean speedup over V100 GPU
(4.94x over K80). The code is available at
https://github.com/linghaosong/Sextans.</description><identifier>DOI: 10.48550/arxiv.2109.11081</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Hardware Architecture</subject><creationdate>2021-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2109.11081$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2109.11081$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Song, Linghao</creatorcontrib><creatorcontrib>Chi, Yuze</creatorcontrib><creatorcontrib>Sohrabizadeh, Atefeh</creatorcontrib><creatorcontrib>Choi, Young-kyu</creatorcontrib><creatorcontrib>Lau, Jason</creatorcontrib><creatorcontrib>Cong, Jason</creatorcontrib><title>Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication</title><description>Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a
wide range of applications, including scientific computing, graph processing,
and deep learning. Architecting accelerators for SpMM is faced with three
challenges - (1) the random memory accessing and unbalanced load in processing
because of random distribution of elements in sparse matrices, (2) inefficient
data handling of the large matrices which can not be fit on-chip, and (3)
anon-general-purpose accelerator design where one accelerator can only process
a fixed-size problem. In this paper, we present Sextans, an accelerator for
general-purpose SpMM processing. Sextans accelerator features (1) fast random
access using on-chip memory, (2) streaming access to off-chip large matrices,
(3) PE-aware non-zero scheduling for balanced workload with an II=1 pipeline,
and (4) hardware flexibility to enable prototyping the hardware once to support
SpMMs of different size as a general-purpose accelerator. We leverage high
bandwidth memory (HBM) for the efficient accessing of both sparse and dense
matrices. In the evaluation, we present an FPGA prototype Sextans which is
executable on a Xilinx U280 HBM FPGA board and a projected prototype Sextans-P
with higher bandwidth comparable to V100 and more frequency optimization. We
conduct a comprehensive evaluation on 1,400 SpMMs on a wide range of sparse
matrices including 50 matrices from SNAP and 150 from SuiteSparse.
WecompareSextanswith NVIDIA K80 and V100 GPUs.Sextansachieves a 2.50x geomean
speedup over K80 GPU andSextans-Pachieves a 1.14x geomean speedup over V100 GPU
(4.94x over K80). The code is available at
https://github.com/linghaosong/Sextans.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Hardware Architecture</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpFj01OwzAYRL1hgQoHYIUv4GAnTuywiwqUSq1AStdE_vmMLKVO5Lgo3J60ILEYjd5mNA-hO0YzLsuSPqg4-68sZ7TOGKOSXaOPFuakwvSIG9ymCOrowydujIEeokpDxG7JBsJCPXk_xXGYALejihOQvUrRz_gJwj_sT33yY--NSn4IN-jKqX6C279eocPL82H9SnZvm-262RFVCUYcaGt1CVYIxwteVlYyszy2xvBcUiFyLi2vOWiuqdG8yhlALZykmhkLrFih-9_Zi2A3Rn9U8bs7i3YX0eIHWsdPbQ</recordid><startdate>20210922</startdate><enddate>20210922</enddate><creator>Song, Linghao</creator><creator>Chi, Yuze</creator><creator>Sohrabizadeh, Atefeh</creator><creator>Choi, Young-kyu</creator><creator>Lau, Jason</creator><creator>Cong, Jason</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210922</creationdate><title>Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication</title><author>Song, Linghao ; Chi, Yuze ; Sohrabizadeh, Atefeh ; Choi, Young-kyu ; Lau, Jason ; Cong, Jason</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-febddb5ed77f43456d81c485dcc428077248d494eb4b0cb4621ee97f80b1cde13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Hardware Architecture</topic><toplevel>online_resources</toplevel><creatorcontrib>Song, Linghao</creatorcontrib><creatorcontrib>Chi, Yuze</creatorcontrib><creatorcontrib>Sohrabizadeh, Atefeh</creatorcontrib><creatorcontrib>Choi, Young-kyu</creatorcontrib><creatorcontrib>Lau, Jason</creatorcontrib><creatorcontrib>Cong, Jason</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Song, Linghao</au><au>Chi, Yuze</au><au>Sohrabizadeh, Atefeh</au><au>Choi, Young-kyu</au><au>Lau, Jason</au><au>Cong, Jason</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication</atitle><date>2021-09-22</date><risdate>2021</risdate><abstract>Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a
wide range of applications, including scientific computing, graph processing,
and deep learning. Architecting accelerators for SpMM is faced with three
challenges - (1) the random memory accessing and unbalanced load in processing
because of random distribution of elements in sparse matrices, (2) inefficient
data handling of the large matrices which can not be fit on-chip, and (3)
anon-general-purpose accelerator design where one accelerator can only process
a fixed-size problem. In this paper, we present Sextans, an accelerator for
general-purpose SpMM processing. Sextans accelerator features (1) fast random
access using on-chip memory, (2) streaming access to off-chip large matrices,
(3) PE-aware non-zero scheduling for balanced workload with an II=1 pipeline,
and (4) hardware flexibility to enable prototyping the hardware once to support
SpMMs of different size as a general-purpose accelerator. We leverage high
bandwidth memory (HBM) for the efficient accessing of both sparse and dense
matrices. In the evaluation, we present an FPGA prototype Sextans which is
executable on a Xilinx U280 HBM FPGA board and a projected prototype Sextans-P
with higher bandwidth comparable to V100 and more frequency optimization. We
conduct a comprehensive evaluation on 1,400 SpMMs on a wide range of sparse
matrices including 50 matrices from SNAP and 150 from SuiteSparse.
WecompareSextanswith NVIDIA K80 and V100 GPUs.Sextansachieves a 2.50x geomean
speedup over K80 GPU andSextans-Pachieves a 1.14x geomean speedup over V100 GPU
(4.94x over K80). The code is available at
https://github.com/linghaosong/Sextans.</abstract><doi>10.48550/arxiv.2109.11081</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2109.11081 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2109_11081 |
source | arXiv.org |
subjects | Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Hardware Architecture |
title | Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T17%3A41%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sextans:%20A%20Streaming%20Accelerator%20for%20General-Purpose%20Sparse-Matrix%20Dense-Matrix%20Multiplication&rft.au=Song,%20Linghao&rft.date=2021-09-22&rft_id=info:doi/10.48550/arxiv.2109.11081&rft_dat=%3Carxiv_GOX%3E2109_11081%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |