Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator

This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-int...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on parallel computing 2022-09, Vol.9 (3), p.1-23
Hauptverfasser: Nguyen, Hung K., Tran, Xuan-Tu
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 23
container_issue 3
container_start_page 1
container_title ACM transactions on parallel computing
container_volume 9
creator Nguyen, Hung K.
Tran, Xuan-Tu
description This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-intensive loops in multimedia applications. The novel architecture of REMAC's reconfigurable processing unit (RPU) allows multiple iterations of a kernel loop can execute concurrently in the pipelining fashion by the temporal overlapping of the configuration fetch, execution, and store processes as much as possible. To address the huge bandwidth required by parallel processing units, REMAC architecture is proposed to efficiently exploit the abundant data locality in the kernel loops to decrease data access bandwidth while increase the efficiency of pipelined execution. In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU. The proposed architecture was modeled at RTL using VHDL language. Several benchmark applications were mapped onto REMAC to validate the high-flexibility and high-performance of the architecture and prove that it is appropriate for a wide set of multimedia applications. The experimental results show that REMAC's performance is better than Xilinx Virtex-II, ADRES, REMUS-II, and TI C64+ DSP.
doi_str_mv 10.1145/3543544
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3543544</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3543544</sourcerecordid><originalsourceid>FETCH-LOGICAL-c187t-11f171dda688c02c5073e85f0157d45572efded807f1248818d8de3c813d8cc13</originalsourceid><addsrcrecordid>eNo9kM1KAzEURoMoWLT4Ctm5Gs2dJE1mWVp_ChVBdD1ck5sSyWRKMl307VUswoHvrL7FYewGxB2A0vdSqx_UGZu1su0a1Wlz_u-qu2TzWr-EENBqs7DdjPVrqnGXOWbPN8M-0UB5wimOmY-BI1-NWCo1u4Ixk-frY8YhOkzpyN_IjTnE3aHgZyL-ckhTHMhH5EvnKFHBaSzX7CJgqjQ_7RX7eHx4Xz0329enzWq5bRxYMzUAAQx4jwtrnWidFkaS1UGANl5pbVoKnrwVJkCrrAXrrSfpLEhvnQN5xW7_fl0Zay0U-n2JA5ZjD6L_TdOf0shvJ7NWEQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator</title><source>ACM Digital Library Complete</source><creator>Nguyen, Hung K. ; Tran, Xuan-Tu</creator><creatorcontrib>Nguyen, Hung K. ; Tran, Xuan-Tu</creatorcontrib><description>This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-intensive loops in multimedia applications. The novel architecture of REMAC's reconfigurable processing unit (RPU) allows multiple iterations of a kernel loop can execute concurrently in the pipelining fashion by the temporal overlapping of the configuration fetch, execution, and store processes as much as possible. To address the huge bandwidth required by parallel processing units, REMAC architecture is proposed to efficiently exploit the abundant data locality in the kernel loops to decrease data access bandwidth while increase the efficiency of pipelined execution. In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU. The proposed architecture was modeled at RTL using VHDL language. Several benchmark applications were mapped onto REMAC to validate the high-flexibility and high-performance of the architecture and prove that it is appropriate for a wide set of multimedia applications. The experimental results show that REMAC's performance is better than Xilinx Virtex-II, ADRES, REMUS-II, and TI C64+ DSP.</description><identifier>ISSN: 2329-4949</identifier><identifier>EISSN: 2329-4957</identifier><identifier>DOI: 10.1145/3543544</identifier><language>eng</language><ispartof>ACM transactions on parallel computing, 2022-09, Vol.9 (3), p.1-23</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c187t-11f171dda688c02c5073e85f0157d45572efded807f1248818d8de3c813d8cc13</cites><orcidid>0000-0003-3417-3447 ; 0000-0003-4259-9579</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Nguyen, Hung K.</creatorcontrib><creatorcontrib>Tran, Xuan-Tu</creatorcontrib><title>Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator</title><title>ACM transactions on parallel computing</title><description>This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-intensive loops in multimedia applications. The novel architecture of REMAC's reconfigurable processing unit (RPU) allows multiple iterations of a kernel loop can execute concurrently in the pipelining fashion by the temporal overlapping of the configuration fetch, execution, and store processes as much as possible. To address the huge bandwidth required by parallel processing units, REMAC architecture is proposed to efficiently exploit the abundant data locality in the kernel loops to decrease data access bandwidth while increase the efficiency of pipelined execution. In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU. The proposed architecture was modeled at RTL using VHDL language. Several benchmark applications were mapped onto REMAC to validate the high-flexibility and high-performance of the architecture and prove that it is appropriate for a wide set of multimedia applications. The experimental results show that REMAC's performance is better than Xilinx Virtex-II, ADRES, REMUS-II, and TI C64+ DSP.</description><issn>2329-4949</issn><issn>2329-4957</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNo9kM1KAzEURoMoWLT4Ctm5Gs2dJE1mWVp_ChVBdD1ck5sSyWRKMl307VUswoHvrL7FYewGxB2A0vdSqx_UGZu1su0a1Wlz_u-qu2TzWr-EENBqs7DdjPVrqnGXOWbPN8M-0UB5wimOmY-BI1-NWCo1u4Ixk-frY8YhOkzpyN_IjTnE3aHgZyL-ckhTHMhH5EvnKFHBaSzX7CJgqjQ_7RX7eHx4Xz0329enzWq5bRxYMzUAAQx4jwtrnWidFkaS1UGANl5pbVoKnrwVJkCrrAXrrSfpLEhvnQN5xW7_fl0Zay0U-n2JA5ZjD6L_TdOf0shvJ7NWEQ</recordid><startdate>20220930</startdate><enddate>20220930</enddate><creator>Nguyen, Hung K.</creator><creator>Tran, Xuan-Tu</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-3417-3447</orcidid><orcidid>https://orcid.org/0000-0003-4259-9579</orcidid></search><sort><creationdate>20220930</creationdate><title>Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator</title><author>Nguyen, Hung K. ; Tran, Xuan-Tu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c187t-11f171dda688c02c5073e85f0157d45572efded807f1248818d8de3c813d8cc13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nguyen, Hung K.</creatorcontrib><creatorcontrib>Tran, Xuan-Tu</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on parallel computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen, Hung K.</au><au>Tran, Xuan-Tu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator</atitle><jtitle>ACM transactions on parallel computing</jtitle><date>2022-09-30</date><risdate>2022</risdate><volume>9</volume><issue>3</issue><spage>1</spage><epage>23</epage><pages>1-23</pages><issn>2329-4949</issn><eissn>2329-4957</eissn><abstract>This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-intensive loops in multimedia applications. The novel architecture of REMAC's reconfigurable processing unit (RPU) allows multiple iterations of a kernel loop can execute concurrently in the pipelining fashion by the temporal overlapping of the configuration fetch, execution, and store processes as much as possible. To address the huge bandwidth required by parallel processing units, REMAC architecture is proposed to efficiently exploit the abundant data locality in the kernel loops to decrease data access bandwidth while increase the efficiency of pipelined execution. In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU. The proposed architecture was modeled at RTL using VHDL language. Several benchmark applications were mapped onto REMAC to validate the high-flexibility and high-performance of the architecture and prove that it is appropriate for a wide set of multimedia applications. The experimental results show that REMAC's performance is better than Xilinx Virtex-II, ADRES, REMUS-II, and TI C64+ DSP.</abstract><doi>10.1145/3543544</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0003-3417-3447</orcidid><orcidid>https://orcid.org/0000-0003-4259-9579</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 2329-4949
ispartof ACM transactions on parallel computing, 2022-09, Vol.9 (3), p.1-23
issn 2329-4949
2329-4957
language eng
recordid cdi_crossref_primary_10_1145_3543544
source ACM Digital Library Complete
title Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T08%3A31%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Design%20and%20Implementation%20of%20a%20Coarse-grained%20Dynamically%20Reconfigurable%20Multimedia%20Accelerator&rft.jtitle=ACM%20transactions%20on%20parallel%20computing&rft.au=Nguyen,%20Hung%20K.&rft.date=2022-09-30&rft.volume=9&rft.issue=3&rft.spage=1&rft.epage=23&rft.pages=1-23&rft.issn=2329-4949&rft.eissn=2329-4957&rft_id=info:doi/10.1145/3543544&rft_dat=%3Ccrossref%3E10_1145_3543544%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true