FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators
In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic a...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on very large scale integration (VLSI) systems 2025-01, p.1-14 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 14 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE transactions on very large scale integration (VLSI) systems |
container_volume | |
creator | Sun, Hao Shen, Junzhong Zhang, Tian Tang, Zhongyi Zhang, Changwu Li, Yuhang Shi, Yang Liu, Hengzhu |
description | In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively. |
doi_str_mv | 10.1109/TVLSI.2024.3522326 |
format | Article |
fullrecord | <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10843963</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10843963</ieee_id><sourcerecordid>10_1109_TVLSI_2024_3522326</sourcerecordid><originalsourceid>FETCH-LOGICAL-c149t-341d3844aa518bacb178fa2f90e44c26f33b981bde17a28eee6115f8df239dfb3</originalsourceid><addsrcrecordid>eNpNkEFOwzAQRS0EEqVwAcTCF0jx2E5qs4sKhUppWaSwjRzHRoG2rsaRUG7flHbB38yXRu8vHiH3wCYATD-uP4tyMeGMy4lIORc8uyAjSNNpoodcDp1lIlEc2DW5ifGbMZBSsxEp5_myfKI5nWO-db8Bf2jwdOm2Aftk5nYdtpYuzX7f7r6oD0ifV6tIw46WfezCZnjmiKanubVu49B0AeMtufJmE93d-Y7Jx_xlPXtLivfXxSwvEgtSd4mQ0AglpTEpqNrYGqbKG-41c1Jannkhaq2gbhxMDVfOuQwg9arxXOjG12JM-GnXYogRna_22G4N9hWw6qil-tNSHbVUZy0D9HCC2mHwH6Ck0JkQBzbzXqs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators</title><source>IEEE Electronic Library (IEL)</source><creator>Sun, Hao ; Shen, Junzhong ; Zhang, Tian ; Tang, Zhongyi ; Zhang, Changwu ; Li, Yuhang ; Shi, Yang ; Liu, Hengzhu</creator><creatorcontrib>Sun, Hao ; Shen, Junzhong ; Zhang, Tian ; Tang, Zhongyi ; Zhang, Changwu ; Li, Yuhang ; Shi, Yang ; Liu, Hengzhu</creatorcontrib><description>In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively.</description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2024.3522326</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>IEEE</publisher><subject>Arrays ; Artificial neural networks ; Computer architecture ; Dataflow ; deep neural network (DNN) accelerator ; Delays ; design space exploration ; Energy consumption ; Hardware ; mapping space exploration ; Space exploration ; System-on-chip ; systolic array ; Systolic arrays ; Tensors</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2025-01, p.1-14</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-5786-3171 ; 0000-0002-2938-2714 ; 0000-0002-2918-8647 ; 0000-0001-6233-6800 ; 0000-0003-4912-0364</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10843963$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10843963$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sun, Hao</creatorcontrib><creatorcontrib>Shen, Junzhong</creatorcontrib><creatorcontrib>Zhang, Tian</creatorcontrib><creatorcontrib>Tang, Zhongyi</creatorcontrib><creatorcontrib>Zhang, Changwu</creatorcontrib><creatorcontrib>Li, Yuhang</creatorcontrib><creatorcontrib>Shi, Yang</creatorcontrib><creatorcontrib>Liu, Hengzhu</creatorcontrib><title>FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description>In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively.</description><subject>Arrays</subject><subject>Artificial neural networks</subject><subject>Computer architecture</subject><subject>Dataflow</subject><subject>deep neural network (DNN) accelerator</subject><subject>Delays</subject><subject>design space exploration</subject><subject>Energy consumption</subject><subject>Hardware</subject><subject>mapping space exploration</subject><subject>Space exploration</subject><subject>System-on-chip</subject><subject>systolic array</subject><subject>Systolic arrays</subject><subject>Tensors</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEFOwzAQRS0EEqVwAcTCF0jx2E5qs4sKhUppWaSwjRzHRoG2rsaRUG7flHbB38yXRu8vHiH3wCYATD-uP4tyMeGMy4lIORc8uyAjSNNpoodcDp1lIlEc2DW5ifGbMZBSsxEp5_myfKI5nWO-db8Bf2jwdOm2Aftk5nYdtpYuzX7f7r6oD0ifV6tIw46WfezCZnjmiKanubVu49B0AeMtufJmE93d-Y7Jx_xlPXtLivfXxSwvEgtSd4mQ0AglpTEpqNrYGqbKG-41c1Jannkhaq2gbhxMDVfOuQwg9arxXOjG12JM-GnXYogRna_22G4N9hWw6qil-tNSHbVUZy0D9HCC2mHwH6Ck0JkQBzbzXqs</recordid><startdate>20250116</startdate><enddate>20250116</enddate><creator>Sun, Hao</creator><creator>Shen, Junzhong</creator><creator>Zhang, Tian</creator><creator>Tang, Zhongyi</creator><creator>Zhang, Changwu</creator><creator>Li, Yuhang</creator><creator>Shi, Yang</creator><creator>Liu, Hengzhu</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-5786-3171</orcidid><orcidid>https://orcid.org/0000-0002-2938-2714</orcidid><orcidid>https://orcid.org/0000-0002-2918-8647</orcidid><orcidid>https://orcid.org/0000-0001-6233-6800</orcidid><orcidid>https://orcid.org/0000-0003-4912-0364</orcidid></search><sort><creationdate>20250116</creationdate><title>FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators</title><author>Sun, Hao ; Shen, Junzhong ; Zhang, Tian ; Tang, Zhongyi ; Zhang, Changwu ; Li, Yuhang ; Shi, Yang ; Liu, Hengzhu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c149t-341d3844aa518bacb178fa2f90e44c26f33b981bde17a28eee6115f8df239dfb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Arrays</topic><topic>Artificial neural networks</topic><topic>Computer architecture</topic><topic>Dataflow</topic><topic>deep neural network (DNN) accelerator</topic><topic>Delays</topic><topic>design space exploration</topic><topic>Energy consumption</topic><topic>Hardware</topic><topic>mapping space exploration</topic><topic>Space exploration</topic><topic>System-on-chip</topic><topic>systolic array</topic><topic>Systolic arrays</topic><topic>Tensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Hao</creatorcontrib><creatorcontrib>Shen, Junzhong</creatorcontrib><creatorcontrib>Zhang, Tian</creatorcontrib><creatorcontrib>Tang, Zhongyi</creatorcontrib><creatorcontrib>Zhang, Changwu</creatorcontrib><creatorcontrib>Li, Yuhang</creatorcontrib><creatorcontrib>Shi, Yang</creatorcontrib><creatorcontrib>Liu, Hengzhu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sun, Hao</au><au>Shen, Junzhong</au><au>Zhang, Tian</au><au>Tang, Zhongyi</au><au>Zhang, Changwu</au><au>Li, Yuhang</au><au>Shi, Yang</au><au>Liu, Hengzhu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2025-01-16</date><risdate>2025</risdate><spage>1</spage><epage>14</epage><pages>1-14</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract>In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively.</abstract><pub>IEEE</pub><doi>10.1109/TVLSI.2024.3522326</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-5786-3171</orcidid><orcidid>https://orcid.org/0000-0002-2938-2714</orcidid><orcidid>https://orcid.org/0000-0002-2918-8647</orcidid><orcidid>https://orcid.org/0000-0001-6233-6800</orcidid><orcidid>https://orcid.org/0000-0003-4912-0364</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1063-8210 |
ispartof | IEEE transactions on very large scale integration (VLSI) systems, 2025-01, p.1-14 |
issn | 1063-8210 1557-9999 |
language | eng |
recordid | cdi_ieee_primary_10843963 |
source | IEEE Electronic Library (IEL) |
subjects | Arrays Artificial neural networks Computer architecture Dataflow deep neural network (DNN) accelerator Delays design space exploration Energy consumption Hardware mapping space exploration Space exploration System-on-chip systolic array Systolic arrays Tensors |
title | FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T13%3A20%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FAMS:%20A%20FrAmework%20of%20Memory-Centric%20Mapping%20for%20DNNs%20on%20Systolic%20Array%20Accelerators&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Sun,%20Hao&rft.date=2025-01-16&rft.spage=1&rft.epage=14&rft.pages=1-14&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2024.3522326&rft_dat=%3Ccrossref_RIE%3E10_1109_TVLSI_2024_3522326%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10843963&rfr_iscdi=true |