A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis

Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems 2020-06, Vol.39 (6), p.1272-1285
Hauptverfasser:	Hosseinabady, Mohammad, Nunez-Yanez, Jose Luis
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer architecture Data transmission Edge computing energy Energy consumption Engines Field programmable gate arrays FPGA Hardware High level synthesis high-level synthesis (HLS) Machine learning Mathematical analysis Matrix algebra Matrix methods Multiplication Optimization Pipelining (computers) Sparse matrices sparse-matrix-vector Sparsity support vector machine (SVM) Support vector machines
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1285
container_issue	6
container_start_page	1272
container_title	IEEE transactions on computer-aided design of integrated circuits and systems
container_volume	39
creator	Hosseinabady, Mohammad Nunez-Yanez, Jose Luis
description	Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , dataflow graph , and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU.
doi_str_mv	10.1109/TCAD.2019.2912923
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2406702675</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8695747</ieee_id><sourcerecordid>2406702675</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-edc0b0a27e74d09d86a1f3792c7523e12399c516e58dc265f98ff1163f3d4a4f3</originalsourceid><addsrcrecordid>eNo9kE1PAjEQQBujiYj-AOOliefFfmzb7ZEAignEA-DRTd2dQsmyu7ZF5d-7BOJpksl7M8lD6J6SAaVEPy1Hw_GAEaoHTFOmGb9APaq5SlIq6CXqEaayhBBFrtFNCFtCaCqY7qGPIV5ED2bn6jUem2hs1fzgSb12NWDbeLxojQ-A5yZ695u8QxG75XxfRddWrjDRNTVehaM9detNMoNvqPDiUMcNBBdu0ZU1VYC78-yj1fNkOZoms7eX19FwlhScy5hAWZBPYpgClZZEl5k01HKlWaEE40AZ17oQVILIyoJJYXVmLaWSW16mJrW8jx5Pd1vffO0hxHzb7H3dvcxZSqQiTCrRUfREFb4JwYPNW-92xh9ySvJjxvyYMT9mzM8ZO-fh5DgA-OczqYVKFf8Djy9uGg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2406702675</pqid></control><display><type>article</type><title>A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis</title><source>IEEE Electronic Library (IEL)</source><creator>Hosseinabady, Mohammad ; Nunez-Yanez, Jose Luis</creator><creatorcontrib>Hosseinabady, Mohammad ; Nunez-Yanez, Jose Luis</creatorcontrib><description>Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , dataflow graph , and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU.</description><identifier>ISSN: 0278-0070</identifier><identifier>EISSN: 1937-4151</identifier><identifier>DOI: 10.1109/TCAD.2019.2912923</identifier><identifier>CODEN: ITCSDI</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Computer architecture ; Data transmission ; Edge computing ; energy ; Energy consumption ; Engines ; Field programmable gate arrays ; FPGA ; Hardware ; High level synthesis ; high-level synthesis (HLS) ; Machine learning ; Mathematical analysis ; Matrix algebra ; Matrix methods ; Multiplication ; Optimization ; Pipelining (computers) ; Sparse matrices ; sparse-matrix-vector ; Sparsity ; support vector machine (SVM) ; Support vector machines</subject><ispartof>IEEE transactions on computer-aided design of integrated circuits and systems, 2020-06, Vol.39 (6), p.1272-1285</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-edc0b0a27e74d09d86a1f3792c7523e12399c516e58dc265f98ff1163f3d4a4f3</citedby><cites>FETCH-LOGICAL-c336t-edc0b0a27e74d09d86a1f3792c7523e12399c516e58dc265f98ff1163f3d4a4f3</cites><orcidid>0000-0003-3989-4999 ; 0000-0002-5153-5481</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8695747$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8695747$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hosseinabady, Mohammad</creatorcontrib><creatorcontrib>Nunez-Yanez, Jose Luis</creatorcontrib><title>A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis</title><title>IEEE transactions on computer-aided design of integrated circuits and systems</title><addtitle>TCAD</addtitle><description>Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , dataflow graph , and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU.</description><subject>Algorithms</subject><subject>Computer architecture</subject><subject>Data transmission</subject><subject>Edge computing</subject><subject>energy</subject><subject>Energy consumption</subject><subject>Engines</subject><subject>Field programmable gate arrays</subject><subject>FPGA</subject><subject>Hardware</subject><subject>High level synthesis</subject><subject>high-level synthesis (HLS)</subject><subject>Machine learning</subject><subject>Mathematical analysis</subject><subject>Matrix algebra</subject><subject>Matrix methods</subject><subject>Multiplication</subject><subject>Optimization</subject><subject>Pipelining (computers)</subject><subject>Sparse matrices</subject><subject>sparse-matrix-vector</subject><subject>Sparsity</subject><subject>support vector machine (SVM)</subject><subject>Support vector machines</subject><issn>0278-0070</issn><issn>1937-4151</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1PAjEQQBujiYj-AOOliefFfmzb7ZEAignEA-DRTd2dQsmyu7ZF5d-7BOJpksl7M8lD6J6SAaVEPy1Hw_GAEaoHTFOmGb9APaq5SlIq6CXqEaayhBBFrtFNCFtCaCqY7qGPIV5ED2bn6jUem2hs1fzgSb12NWDbeLxojQ-A5yZ695u8QxG75XxfRddWrjDRNTVehaM9detNMoNvqPDiUMcNBBdu0ZU1VYC78-yj1fNkOZoms7eX19FwlhScy5hAWZBPYpgClZZEl5k01HKlWaEE40AZ17oQVILIyoJJYXVmLaWSW16mJrW8jx5Pd1vffO0hxHzb7H3dvcxZSqQiTCrRUfREFb4JwYPNW-92xh9ySvJjxvyYMT9mzM8ZO-fh5DgA-OczqYVKFf8Djy9uGg</recordid><startdate>20200601</startdate><enddate>20200601</enddate><creator>Hosseinabady, Mohammad</creator><creator>Nunez-Yanez, Jose Luis</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3989-4999</orcidid><orcidid>https://orcid.org/0000-0002-5153-5481</orcidid></search><sort><creationdate>20200601</creationdate><title>A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis</title><author>Hosseinabady, Mohammad ; Nunez-Yanez, Jose Luis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-edc0b0a27e74d09d86a1f3792c7523e12399c516e58dc265f98ff1163f3d4a4f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Computer architecture</topic><topic>Data transmission</topic><topic>Edge computing</topic><topic>energy</topic><topic>Energy consumption</topic><topic>Engines</topic><topic>Field programmable gate arrays</topic><topic>FPGA</topic><topic>Hardware</topic><topic>High level synthesis</topic><topic>high-level synthesis (HLS)</topic><topic>Machine learning</topic><topic>Mathematical analysis</topic><topic>Matrix algebra</topic><topic>Matrix methods</topic><topic>Multiplication</topic><topic>Optimization</topic><topic>Pipelining (computers)</topic><topic>Sparse matrices</topic><topic>sparse-matrix-vector</topic><topic>Sparsity</topic><topic>support vector machine (SVM)</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hosseinabady, Mohammad</creatorcontrib><creatorcontrib>Nunez-Yanez, Jose Luis</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hosseinabady, Mohammad</au><au>Nunez-Yanez, Jose Luis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis</atitle><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle><stitle>TCAD</stitle><date>2020-06-01</date><risdate>2020</risdate><volume>39</volume><issue>6</issue><spage>1272</spage><epage>1285</epage><pages>1272-1285</pages><issn>0278-0070</issn><eissn>1937-4151</eissn><coden>ITCSDI</coden><abstract>Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , dataflow graph , and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCAD.2019.2912923</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-3989-4999</orcidid><orcidid>https://orcid.org/0000-0002-5153-5481</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0278-0070
ispartof	IEEE transactions on computer-aided design of integrated circuits and systems, 2020-06, Vol.39 (6), p.1272-1285
issn	0278-0070 1937-4151
language	eng
recordid	cdi_proquest_journals_2406702675
source	IEEE Electronic Library (IEL)
subjects	Algorithms Computer architecture Data transmission Edge computing energy Energy consumption Engines Field programmable gate arrays FPGA Hardware High level synthesis high-level synthesis (HLS) Machine learning Mathematical analysis Matrix algebra Matrix methods Multiplication Optimization Pipelining (computers) Sparse matrices sparse-matrix-vector Sparsity support vector machine (SVM) Support vector machines
title	A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T01%3A42%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Streaming%20Dataflow%20Engine%20for%20Sparse%20Matrix-Vector%20Multiplication%20Using%20High-Level%20Synthesis&rft.jtitle=IEEE%20transactions%20on%20computer-aided%20design%20of%20integrated%20circuits%20and%20systems&rft.au=Hosseinabady,%20Mohammad&rft.date=2020-06-01&rft.volume=39&rft.issue=6&rft.spage=1272&rft.epage=1285&rft.pages=1272-1285&rft.issn=0278-0070&rft.eissn=1937-4151&rft.coden=ITCSDI&rft_id=info:doi/10.1109/TCAD.2019.2912923&rft_dat=%3Cproquest_RIE%3E2406702675%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2406702675&rft_id=info:pmid/&rft_ieee_id=8695747&rfr_iscdi=true