Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator
This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-int...
Gespeichert in:
Veröffentlicht in: | ACM transactions on parallel computing 2022-09, Vol.9 (3), p.1-23 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 23 |
---|---|
container_issue | 3 |
container_start_page | 1 |
container_title | ACM transactions on parallel computing |
container_volume | 9 |
creator | Nguyen, Hung K. Tran, Xuan-Tu |
description | This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-intensive loops in multimedia applications. The novel architecture of REMAC's reconfigurable processing unit (RPU) allows multiple iterations of a kernel loop can execute concurrently in the pipelining fashion by the temporal overlapping of the configuration fetch, execution, and store processes as much as possible. To address the huge bandwidth required by parallel processing units, REMAC architecture is proposed to efficiently exploit the abundant data locality in the kernel loops to decrease data access bandwidth while increase the efficiency of pipelined execution. In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU. The proposed architecture was modeled at RTL using VHDL language. Several benchmark applications were mapped onto REMAC to validate the high-flexibility and high-performance of the architecture and prove that it is appropriate for a wide set of multimedia applications. The experimental results show that REMAC's performance is better than Xilinx Virtex-II, ADRES, REMUS-II, and TI C64+ DSP. |
doi_str_mv | 10.1145/3543544 |
format | Article |
fullrecord | <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3543544</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3543544</sourcerecordid><originalsourceid>FETCH-LOGICAL-c187t-11f171dda688c02c5073e85f0157d45572efded807f1248818d8de3c813d8cc13</originalsourceid><addsrcrecordid>eNo9kM1KAzEURoMoWLT4Ctm5Gs2dJE1mWVp_ChVBdD1ck5sSyWRKMl307VUswoHvrL7FYewGxB2A0vdSqx_UGZu1su0a1Wlz_u-qu2TzWr-EENBqs7DdjPVrqnGXOWbPN8M-0UB5wimOmY-BI1-NWCo1u4Ixk-frY8YhOkzpyN_IjTnE3aHgZyL-ckhTHMhH5EvnKFHBaSzX7CJgqjQ_7RX7eHx4Xz0329enzWq5bRxYMzUAAQx4jwtrnWidFkaS1UGANl5pbVoKnrwVJkCrrAXrrSfpLEhvnQN5xW7_fl0Zay0U-n2JA5ZjD6L_TdOf0shvJ7NWEQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator</title><source>ACM Digital Library Complete</source><creator>Nguyen, Hung K. ; Tran, Xuan-Tu</creator><creatorcontrib>Nguyen, Hung K. ; Tran, Xuan-Tu</creatorcontrib><description>This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-intensive loops in multimedia applications. The novel architecture of REMAC's reconfigurable processing unit (RPU) allows multiple iterations of a kernel loop can execute concurrently in the pipelining fashion by the temporal overlapping of the configuration fetch, execution, and store processes as much as possible. To address the huge bandwidth required by parallel processing units, REMAC architecture is proposed to efficiently exploit the abundant data locality in the kernel loops to decrease data access bandwidth while increase the efficiency of pipelined execution. In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU. The proposed architecture was modeled at RTL using VHDL language. Several benchmark applications were mapped onto REMAC to validate the high-flexibility and high-performance of the architecture and prove that it is appropriate for a wide set of multimedia applications. The experimental results show that REMAC's performance is better than Xilinx Virtex-II, ADRES, REMUS-II, and TI C64+ DSP.</description><identifier>ISSN: 2329-4949</identifier><identifier>EISSN: 2329-4957</identifier><identifier>DOI: 10.1145/3543544</identifier><language>eng</language><ispartof>ACM transactions on parallel computing, 2022-09, Vol.9 (3), p.1-23</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c187t-11f171dda688c02c5073e85f0157d45572efded807f1248818d8de3c813d8cc13</cites><orcidid>0000-0003-3417-3447 ; 0000-0003-4259-9579</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Nguyen, Hung K.</creatorcontrib><creatorcontrib>Tran, Xuan-Tu</creatorcontrib><title>Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator</title><title>ACM transactions on parallel computing</title><description>This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-intensive loops in multimedia applications. The novel architecture of REMAC's reconfigurable processing unit (RPU) allows multiple iterations of a kernel loop can execute concurrently in the pipelining fashion by the temporal overlapping of the configuration fetch, execution, and store processes as much as possible. To address the huge bandwidth required by parallel processing units, REMAC architecture is proposed to efficiently exploit the abundant data locality in the kernel loops to decrease data access bandwidth while increase the efficiency of pipelined execution. In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU. The proposed architecture was modeled at RTL using VHDL language. Several benchmark applications were mapped onto REMAC to validate the high-flexibility and high-performance of the architecture and prove that it is appropriate for a wide set of multimedia applications. The experimental results show that REMAC's performance is better than Xilinx Virtex-II, ADRES, REMUS-II, and TI C64+ DSP.</description><issn>2329-4949</issn><issn>2329-4957</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNo9kM1KAzEURoMoWLT4Ctm5Gs2dJE1mWVp_ChVBdD1ck5sSyWRKMl307VUswoHvrL7FYewGxB2A0vdSqx_UGZu1su0a1Wlz_u-qu2TzWr-EENBqs7DdjPVrqnGXOWbPN8M-0UB5wimOmY-BI1-NWCo1u4Ixk-frY8YhOkzpyN_IjTnE3aHgZyL-ckhTHMhH5EvnKFHBaSzX7CJgqjQ_7RX7eHx4Xz0329enzWq5bRxYMzUAAQx4jwtrnWidFkaS1UGANl5pbVoKnrwVJkCrrAXrrSfpLEhvnQN5xW7_fl0Zay0U-n2JA5ZjD6L_TdOf0shvJ7NWEQ</recordid><startdate>20220930</startdate><enddate>20220930</enddate><creator>Nguyen, Hung K.</creator><creator>Tran, Xuan-Tu</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-3417-3447</orcidid><orcidid>https://orcid.org/0000-0003-4259-9579</orcidid></search><sort><creationdate>20220930</creationdate><title>Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator</title><author>Nguyen, Hung K. ; Tran, Xuan-Tu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c187t-11f171dda688c02c5073e85f0157d45572efded807f1248818d8de3c813d8cc13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nguyen, Hung K.</creatorcontrib><creatorcontrib>Tran, Xuan-Tu</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on parallel computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen, Hung K.</au><au>Tran, Xuan-Tu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator</atitle><jtitle>ACM transactions on parallel computing</jtitle><date>2022-09-30</date><risdate>2022</risdate><volume>9</volume><issue>3</issue><spage>1</spage><epage>23</epage><pages>1-23</pages><issn>2329-4949</issn><eissn>2329-4957</eissn><abstract>This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-intensive loops in multimedia applications. The novel architecture of REMAC's reconfigurable processing unit (RPU) allows multiple iterations of a kernel loop can execute concurrently in the pipelining fashion by the temporal overlapping of the configuration fetch, execution, and store processes as much as possible. To address the huge bandwidth required by parallel processing units, REMAC architecture is proposed to efficiently exploit the abundant data locality in the kernel loops to decrease data access bandwidth while increase the efficiency of pipelined execution. In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU. The proposed architecture was modeled at RTL using VHDL language. Several benchmark applications were mapped onto REMAC to validate the high-flexibility and high-performance of the architecture and prove that it is appropriate for a wide set of multimedia applications. The experimental results show that REMAC's performance is better than Xilinx Virtex-II, ADRES, REMUS-II, and TI C64+ DSP.</abstract><doi>10.1145/3543544</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0003-3417-3447</orcidid><orcidid>https://orcid.org/0000-0003-4259-9579</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2329-4949 |
ispartof | ACM transactions on parallel computing, 2022-09, Vol.9 (3), p.1-23 |
issn | 2329-4949 2329-4957 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3543544 |
source | ACM Digital Library Complete |
title | Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T08%3A31%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Design%20and%20Implementation%20of%20a%20Coarse-grained%20Dynamically%20Reconfigurable%20Multimedia%20Accelerator&rft.jtitle=ACM%20transactions%20on%20parallel%20computing&rft.au=Nguyen,%20Hung%20K.&rft.date=2022-09-30&rft.volume=9&rft.issue=3&rft.spage=1&rft.epage=23&rft.pages=1-23&rft.issn=2329-4949&rft.eissn=2329-4957&rft_id=info:doi/10.1145/3543544&rft_dat=%3Ccrossref%3E10_1145_3543544%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |