A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis
Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , d...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on computer-aided design of integrated circuits and systems 2020-06, Vol.39 (6), p.1272-1285 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1285 |
---|---|
container_issue | 6 |
container_start_page | 1272 |
container_title | IEEE transactions on computer-aided design of integrated circuits and systems |
container_volume | 39 |
creator | Hosseinabady, Mohammad Nunez-Yanez, Jose Luis |
description | Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , dataflow graph , and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU. |
doi_str_mv | 10.1109/TCAD.2019.2912923 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2406702675</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8695747</ieee_id><sourcerecordid>2406702675</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-edc0b0a27e74d09d86a1f3792c7523e12399c516e58dc265f98ff1163f3d4a4f3</originalsourceid><addsrcrecordid>eNo9kE1PAjEQQBujiYj-AOOliefFfmzb7ZEAignEA-DRTd2dQsmyu7ZF5d-7BOJpksl7M8lD6J6SAaVEPy1Hw_GAEaoHTFOmGb9APaq5SlIq6CXqEaayhBBFrtFNCFtCaCqY7qGPIV5ED2bn6jUem2hs1fzgSb12NWDbeLxojQ-A5yZ695u8QxG75XxfRddWrjDRNTVehaM9detNMoNvqPDiUMcNBBdu0ZU1VYC78-yj1fNkOZoms7eX19FwlhScy5hAWZBPYpgClZZEl5k01HKlWaEE40AZ17oQVILIyoJJYXVmLaWSW16mJrW8jx5Pd1vffO0hxHzb7H3dvcxZSqQiTCrRUfREFb4JwYPNW-92xh9ySvJjxvyYMT9mzM8ZO-fh5DgA-OczqYVKFf8Djy9uGg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2406702675</pqid></control><display><type>article</type><title>A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis</title><source>IEEE Electronic Library (IEL)</source><creator>Hosseinabady, Mohammad ; Nunez-Yanez, Jose Luis</creator><creatorcontrib>Hosseinabady, Mohammad ; Nunez-Yanez, Jose Luis</creatorcontrib><description>Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , dataflow graph , and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU.</description><identifier>ISSN: 0278-0070</identifier><identifier>EISSN: 1937-4151</identifier><identifier>DOI: 10.1109/TCAD.2019.2912923</identifier><identifier>CODEN: ITCSDI</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Computer architecture ; Data transmission ; Edge computing ; energy ; Energy consumption ; Engines ; Field programmable gate arrays ; FPGA ; Hardware ; High level synthesis ; high-level synthesis (HLS) ; Machine learning ; Mathematical analysis ; Matrix algebra ; Matrix methods ; Multiplication ; Optimization ; Pipelining (computers) ; Sparse matrices ; sparse-matrix-vector ; Sparsity ; support vector machine (SVM) ; Support vector machines</subject><ispartof>IEEE transactions on computer-aided design of integrated circuits and systems, 2020-06, Vol.39 (6), p.1272-1285</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-edc0b0a27e74d09d86a1f3792c7523e12399c516e58dc265f98ff1163f3d4a4f3</citedby><cites>FETCH-LOGICAL-c336t-edc0b0a27e74d09d86a1f3792c7523e12399c516e58dc265f98ff1163f3d4a4f3</cites><orcidid>0000-0003-3989-4999 ; 0000-0002-5153-5481</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8695747$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8695747$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hosseinabady, Mohammad</creatorcontrib><creatorcontrib>Nunez-Yanez, Jose Luis</creatorcontrib><title>A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis</title><title>IEEE transactions on computer-aided design of integrated circuits and systems</title><addtitle>TCAD</addtitle><description>Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , dataflow graph , and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU.</description><subject>Algorithms</subject><subject>Computer architecture</subject><subject>Data transmission</subject><subject>Edge computing</subject><subject>energy</subject><subject>Energy consumption</subject><subject>Engines</subject><subject>Field programmable gate arrays</subject><subject>FPGA</subject><subject>Hardware</subject><subject>High level synthesis</subject><subject>high-level synthesis (HLS)</subject><subject>Machine learning</subject><subject>Mathematical analysis</subject><subject>Matrix algebra</subject><subject>Matrix methods</subject><subject>Multiplication</subject><subject>Optimization</subject><subject>Pipelining (computers)</subject><subject>Sparse matrices</subject><subject>sparse-matrix-vector</subject><subject>Sparsity</subject><subject>support vector machine (SVM)</subject><subject>Support vector machines</subject><issn>0278-0070</issn><issn>1937-4151</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1PAjEQQBujiYj-AOOliefFfmzb7ZEAignEA-DRTd2dQsmyu7ZF5d-7BOJpksl7M8lD6J6SAaVEPy1Hw_GAEaoHTFOmGb9APaq5SlIq6CXqEaayhBBFrtFNCFtCaCqY7qGPIV5ED2bn6jUem2hs1fzgSb12NWDbeLxojQ-A5yZ695u8QxG75XxfRddWrjDRNTVehaM9detNMoNvqPDiUMcNBBdu0ZU1VYC78-yj1fNkOZoms7eX19FwlhScy5hAWZBPYpgClZZEl5k01HKlWaEE40AZ17oQVILIyoJJYXVmLaWSW16mJrW8jx5Pd1vffO0hxHzb7H3dvcxZSqQiTCrRUfREFb4JwYPNW-92xh9ySvJjxvyYMT9mzM8ZO-fh5DgA-OczqYVKFf8Djy9uGg</recordid><startdate>20200601</startdate><enddate>20200601</enddate><creator>Hosseinabady, Mohammad</creator><creator>Nunez-Yanez, Jose Luis</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3989-4999</orcidid><orcidid>https://orcid.org/0000-0002-5153-5481</orcidid></search><sort><creationdate>20200601</creationdate><title>A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis</title><author>Hosseinabady, Mohammad ; Nunez-Yanez, Jose Luis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-edc0b0a27e74d09d86a1f3792c7523e12399c516e58dc265f98ff1163f3d4a4f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Computer architecture</topic><topic>Data transmission</topic><topic>Edge computing</topic><topic>energy</topic><topic>Energy consumption</topic><topic>Engines</topic><topic>Field programmable gate arrays</topic><topic>FPGA</topic><topic>Hardware</topic><topic>High level synthesis</topic><topic>high-level synthesis (HLS)</topic><topic>Machine learning</topic><topic>Mathematical analysis</topic><topic>Matrix algebra</topic><topic>Matrix methods</topic><topic>Multiplication</topic><topic>Optimization</topic><topic>Pipelining (computers)</topic><topic>Sparse matrices</topic><topic>sparse-matrix-vector</topic><topic>Sparsity</topic><topic>support vector machine (SVM)</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hosseinabady, Mohammad</creatorcontrib><creatorcontrib>Nunez-Yanez, Jose Luis</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hosseinabady, Mohammad</au><au>Nunez-Yanez, Jose Luis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis</atitle><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle><stitle>TCAD</stitle><date>2020-06-01</date><risdate>2020</risdate><volume>39</volume><issue>6</issue><spage>1272</spage><epage>1285</epage><pages>1272-1285</pages><issn>0278-0070</issn><eissn>1937-4151</eissn><coden>ITCSDI</coden><abstract>Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining , dataflow graph , and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCAD.2019.2912923</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-3989-4999</orcidid><orcidid>https://orcid.org/0000-0002-5153-5481</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0278-0070 |
ispartof | IEEE transactions on computer-aided design of integrated circuits and systems, 2020-06, Vol.39 (6), p.1272-1285 |
issn | 0278-0070 1937-4151 |
language | eng |
recordid | cdi_proquest_journals_2406702675 |
source | IEEE Electronic Library (IEL) |
subjects | Algorithms Computer architecture Data transmission Edge computing energy Energy consumption Engines Field programmable gate arrays FPGA Hardware High level synthesis high-level synthesis (HLS) Machine learning Mathematical analysis Matrix algebra Matrix methods Multiplication Optimization Pipelining (computers) Sparse matrices sparse-matrix-vector Sparsity support vector machine (SVM) Support vector machines |
title | A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T01%3A42%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Streaming%20Dataflow%20Engine%20for%20Sparse%20Matrix-Vector%20Multiplication%20Using%20High-Level%20Synthesis&rft.jtitle=IEEE%20transactions%20on%20computer-aided%20design%20of%20integrated%20circuits%20and%20systems&rft.au=Hosseinabady,%20Mohammad&rft.date=2020-06-01&rft.volume=39&rft.issue=6&rft.spage=1272&rft.epage=1285&rft.pages=1272-1285&rft.issn=0278-0070&rft.eissn=1937-4151&rft.coden=ITCSDI&rft_id=info:doi/10.1109/TCAD.2019.2912923&rft_dat=%3Cproquest_RIE%3E2406702675%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2406702675&rft_id=info:pmid/&rft_ieee_id=8695747&rfr_iscdi=true |