A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis

Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computer-aided design of integrated circuits and systems 2020-06, Vol.39 (6), p.1272-1285
Hauptverfasser: Hosseinabady, Mohammad, Nunez-Yanez, Jose Luis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1285
container_issue 6
container_start_page 1272
container_title IEEE transactions on computer-aided design of integrated circuits and systems
container_volume 39
creator Hosseinabady, Mohammad
Nunez-Yanez, Jose Luis
description Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , dataflow graph , and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU.
doi_str_mv 10.1109/TCAD.2019.2912923
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2406702675</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8695747</ieee_id><sourcerecordid>2406702675</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-edc0b0a27e74d09d86a1f3792c7523e12399c516e58dc265f98ff1163f3d4a4f3</originalsourceid><addsrcrecordid>eNo9kE1PAjEQQBujiYj-AOOliefFfmzb7ZEAignEA-DRTd2dQsmyu7ZF5d-7BOJpksl7M8lD6J6SAaVEPy1Hw_GAEaoHTFOmGb9APaq5SlIq6CXqEaayhBBFrtFNCFtCaCqY7qGPIV5ED2bn6jUem2hs1fzgSb12NWDbeLxojQ-A5yZ695u8QxG75XxfRddWrjDRNTVehaM9detNMoNvqPDiUMcNBBdu0ZU1VYC78-yj1fNkOZoms7eX19FwlhScy5hAWZBPYpgClZZEl5k01HKlWaEE40AZ17oQVILIyoJJYXVmLaWSW16mJrW8jx5Pd1vffO0hxHzb7H3dvcxZSqQiTCrRUfREFb4JwYPNW-92xh9ySvJjxvyYMT9mzM8ZO-fh5DgA-OczqYVKFf8Djy9uGg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2406702675</pqid></control><display><type>article</type><title>A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis</title><source>IEEE Electronic Library (IEL)</source><creator>Hosseinabady, Mohammad ; Nunez-Yanez, Jose Luis</creator><creatorcontrib>Hosseinabady, Mohammad ; Nunez-Yanez, Jose Luis</creatorcontrib><description>Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , dataflow graph , and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU.</description><identifier>ISSN: 0278-0070</identifier><identifier>EISSN: 1937-4151</identifier><identifier>DOI: 10.1109/TCAD.2019.2912923</identifier><identifier>CODEN: ITCSDI</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Computer architecture ; Data transmission ; Edge computing ; energy ; Energy consumption ; Engines ; Field programmable gate arrays ; FPGA ; Hardware ; High level synthesis ; high-level synthesis (HLS) ; Machine learning ; Mathematical analysis ; Matrix algebra ; Matrix methods ; Multiplication ; Optimization ; Pipelining (computers) ; Sparse matrices ; sparse-matrix-vector ; Sparsity ; support vector machine (SVM) ; Support vector machines</subject><ispartof>IEEE transactions on computer-aided design of integrated circuits and systems, 2020-06, Vol.39 (6), p.1272-1285</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-edc0b0a27e74d09d86a1f3792c7523e12399c516e58dc265f98ff1163f3d4a4f3</citedby><cites>FETCH-LOGICAL-c336t-edc0b0a27e74d09d86a1f3792c7523e12399c516e58dc265f98ff1163f3d4a4f3</cites><orcidid>0000-0003-3989-4999 ; 0000-0002-5153-5481</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8695747$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8695747$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hosseinabady, Mohammad</creatorcontrib><creatorcontrib>Nunez-Yanez, Jose Luis</creatorcontrib><title>A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis</title><title>IEEE transactions on computer-aided design of integrated circuits and systems</title><addtitle>TCAD</addtitle><description>Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , dataflow graph , and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU.</description><subject>Algorithms</subject><subject>Computer architecture</subject><subject>Data transmission</subject><subject>Edge computing</subject><subject>energy</subject><subject>Energy consumption</subject><subject>Engines</subject><subject>Field programmable gate arrays</subject><subject>FPGA</subject><subject>Hardware</subject><subject>High level synthesis</subject><subject>high-level synthesis (HLS)</subject><subject>Machine learning</subject><subject>Mathematical analysis</subject><subject>Matrix algebra</subject><subject>Matrix methods</subject><subject>Multiplication</subject><subject>Optimization</subject><subject>Pipelining (computers)</subject><subject>Sparse matrices</subject><subject>sparse-matrix-vector</subject><subject>Sparsity</subject><subject>support vector machine (SVM)</subject><subject>Support vector machines</subject><issn>0278-0070</issn><issn>1937-4151</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1PAjEQQBujiYj-AOOliefFfmzb7ZEAignEA-DRTd2dQsmyu7ZF5d-7BOJpksl7M8lD6J6SAaVEPy1Hw_GAEaoHTFOmGb9APaq5SlIq6CXqEaayhBBFrtFNCFtCaCqY7qGPIV5ED2bn6jUem2hs1fzgSb12NWDbeLxojQ-A5yZ695u8QxG75XxfRddWrjDRNTVehaM9detNMoNvqPDiUMcNBBdu0ZU1VYC78-yj1fNkOZoms7eX19FwlhScy5hAWZBPYpgClZZEl5k01HKlWaEE40AZ17oQVILIyoJJYXVmLaWSW16mJrW8jx5Pd1vffO0hxHzb7H3dvcxZSqQiTCrRUfREFb4JwYPNW-92xh9ySvJjxvyYMT9mzM8ZO-fh5DgA-OczqYVKFf8Djy9uGg</recordid><startdate>20200601</startdate><enddate>20200601</enddate><creator>Hosseinabady, Mohammad</creator><creator>Nunez-Yanez, Jose Luis</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3989-4999</orcidid><orcidid>https://orcid.org/0000-0002-5153-5481</orcidid></search><sort><creationdate>20200601</creationdate><title>A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis</title><author>Hosseinabady, Mohammad ; Nunez-Yanez, Jose Luis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-edc0b0a27e74d09d86a1f3792c7523e12399c516e58dc265f98ff1163f3d4a4f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Computer architecture</topic><topic>Data transmission</topic><topic>Edge computing</topic><topic>energy</topic><topic>Energy consumption</topic><topic>Engines</topic><topic>Field programmable gate arrays</topic><topic>FPGA</topic><topic>Hardware</topic><topic>High level synthesis</topic><topic>high-level synthesis (HLS)</topic><topic>Machine learning</topic><topic>Mathematical analysis</topic><topic>Matrix algebra</topic><topic>Matrix methods</topic><topic>Multiplication</topic><topic>Optimization</topic><topic>Pipelining (computers)</topic><topic>Sparse matrices</topic><topic>sparse-matrix-vector</topic><topic>Sparsity</topic><topic>support vector machine (SVM)</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hosseinabady, Mohammad</creatorcontrib><creatorcontrib>Nunez-Yanez, Jose Luis</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hosseinabady, Mohammad</au><au>Nunez-Yanez, Jose Luis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis</atitle><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle><stitle>TCAD</stitle><date>2020-06-01</date><risdate>2020</risdate><volume>39</volume><issue>6</issue><spage>1272</spage><epage>1285</epage><pages>1272-1285</pages><issn>0278-0070</issn><eissn>1937-4151</eissn><coden>ITCSDI</coden><abstract>Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , dataflow graph , and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCAD.2019.2912923</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-3989-4999</orcidid><orcidid>https://orcid.org/0000-0002-5153-5481</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0278-0070
ispartof IEEE transactions on computer-aided design of integrated circuits and systems, 2020-06, Vol.39 (6), p.1272-1285
issn 0278-0070
1937-4151
language eng
recordid cdi_proquest_journals_2406702675
source IEEE Electronic Library (IEL)
subjects Algorithms
Computer architecture
Data transmission
Edge computing
energy
Energy consumption
Engines
Field programmable gate arrays
FPGA
Hardware
High level synthesis
high-level synthesis (HLS)
Machine learning
Mathematical analysis
Matrix algebra
Matrix methods
Multiplication
Optimization
Pipelining (computers)
Sparse matrices
sparse-matrix-vector
Sparsity
support vector machine (SVM)
Support vector machines
title A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T01%3A42%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Streaming%20Dataflow%20Engine%20for%20Sparse%20Matrix-Vector%20Multiplication%20Using%20High-Level%20Synthesis&rft.jtitle=IEEE%20transactions%20on%20computer-aided%20design%20of%20integrated%20circuits%20and%20systems&rft.au=Hosseinabady,%20Mohammad&rft.date=2020-06-01&rft.volume=39&rft.issue=6&rft.spage=1272&rft.epage=1285&rft.pages=1272-1285&rft.issn=0278-0070&rft.eissn=1937-4151&rft.coden=ITCSDI&rft_id=info:doi/10.1109/TCAD.2019.2912923&rft_dat=%3Cproquest_RIE%3E2406702675%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2406702675&rft_id=info:pmid/&rft_ieee_id=8695747&rfr_iscdi=true