FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators

In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on very large scale integration (VLSI) systems 2025-01, p.1-14
Hauptverfasser: Sun, Hao, Shen, Junzhong, Zhang, Tian, Tang, Zhongyi, Zhang, Changwu, Li, Yuhang, Shi, Yang, Liu, Hengzhu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 14
container_issue
container_start_page 1
container_title IEEE transactions on very large scale integration (VLSI) systems
container_volume
creator Sun, Hao
Shen, Junzhong
Zhang, Tian
Tang, Zhongyi
Zhang, Changwu
Li, Yuhang
Shi, Yang
Liu, Hengzhu
description In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively.
doi_str_mv 10.1109/TVLSI.2024.3522326
format Article
fullrecord <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10843963</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10843963</ieee_id><sourcerecordid>10_1109_TVLSI_2024_3522326</sourcerecordid><originalsourceid>FETCH-LOGICAL-c149t-341d3844aa518bacb178fa2f90e44c26f33b981bde17a28eee6115f8df239dfb3</originalsourceid><addsrcrecordid>eNpNkEFOwzAQRS0EEqVwAcTCF0jx2E5qs4sKhUppWaSwjRzHRoG2rsaRUG7flHbB38yXRu8vHiH3wCYATD-uP4tyMeGMy4lIORc8uyAjSNNpoodcDp1lIlEc2DW5ifGbMZBSsxEp5_myfKI5nWO-db8Bf2jwdOm2Aftk5nYdtpYuzX7f7r6oD0ifV6tIw46WfezCZnjmiKanubVu49B0AeMtufJmE93d-Y7Jx_xlPXtLivfXxSwvEgtSd4mQ0AglpTEpqNrYGqbKG-41c1Jannkhaq2gbhxMDVfOuQwg9arxXOjG12JM-GnXYogRna_22G4N9hWw6qil-tNSHbVUZy0D9HCC2mHwH6Ck0JkQBzbzXqs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators</title><source>IEEE Electronic Library (IEL)</source><creator>Sun, Hao ; Shen, Junzhong ; Zhang, Tian ; Tang, Zhongyi ; Zhang, Changwu ; Li, Yuhang ; Shi, Yang ; Liu, Hengzhu</creator><creatorcontrib>Sun, Hao ; Shen, Junzhong ; Zhang, Tian ; Tang, Zhongyi ; Zhang, Changwu ; Li, Yuhang ; Shi, Yang ; Liu, Hengzhu</creatorcontrib><description>In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively.</description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2024.3522326</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>IEEE</publisher><subject>Arrays ; Artificial neural networks ; Computer architecture ; Dataflow ; deep neural network (DNN) accelerator ; Delays ; design space exploration ; Energy consumption ; Hardware ; mapping space exploration ; Space exploration ; System-on-chip ; systolic array ; Systolic arrays ; Tensors</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2025-01, p.1-14</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-5786-3171 ; 0000-0002-2938-2714 ; 0000-0002-2918-8647 ; 0000-0001-6233-6800 ; 0000-0003-4912-0364</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10843963$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10843963$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sun, Hao</creatorcontrib><creatorcontrib>Shen, Junzhong</creatorcontrib><creatorcontrib>Zhang, Tian</creatorcontrib><creatorcontrib>Tang, Zhongyi</creatorcontrib><creatorcontrib>Zhang, Changwu</creatorcontrib><creatorcontrib>Li, Yuhang</creatorcontrib><creatorcontrib>Shi, Yang</creatorcontrib><creatorcontrib>Liu, Hengzhu</creatorcontrib><title>FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description>In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively.</description><subject>Arrays</subject><subject>Artificial neural networks</subject><subject>Computer architecture</subject><subject>Dataflow</subject><subject>deep neural network (DNN) accelerator</subject><subject>Delays</subject><subject>design space exploration</subject><subject>Energy consumption</subject><subject>Hardware</subject><subject>mapping space exploration</subject><subject>Space exploration</subject><subject>System-on-chip</subject><subject>systolic array</subject><subject>Systolic arrays</subject><subject>Tensors</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEFOwzAQRS0EEqVwAcTCF0jx2E5qs4sKhUppWaSwjRzHRoG2rsaRUG7flHbB38yXRu8vHiH3wCYATD-uP4tyMeGMy4lIORc8uyAjSNNpoodcDp1lIlEc2DW5ifGbMZBSsxEp5_myfKI5nWO-db8Bf2jwdOm2Aftk5nYdtpYuzX7f7r6oD0ifV6tIw46WfezCZnjmiKanubVu49B0AeMtufJmE93d-Y7Jx_xlPXtLivfXxSwvEgtSd4mQ0AglpTEpqNrYGqbKG-41c1Jannkhaq2gbhxMDVfOuQwg9arxXOjG12JM-GnXYogRna_22G4N9hWw6qil-tNSHbVUZy0D9HCC2mHwH6Ck0JkQBzbzXqs</recordid><startdate>20250116</startdate><enddate>20250116</enddate><creator>Sun, Hao</creator><creator>Shen, Junzhong</creator><creator>Zhang, Tian</creator><creator>Tang, Zhongyi</creator><creator>Zhang, Changwu</creator><creator>Li, Yuhang</creator><creator>Shi, Yang</creator><creator>Liu, Hengzhu</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-5786-3171</orcidid><orcidid>https://orcid.org/0000-0002-2938-2714</orcidid><orcidid>https://orcid.org/0000-0002-2918-8647</orcidid><orcidid>https://orcid.org/0000-0001-6233-6800</orcidid><orcidid>https://orcid.org/0000-0003-4912-0364</orcidid></search><sort><creationdate>20250116</creationdate><title>FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators</title><author>Sun, Hao ; Shen, Junzhong ; Zhang, Tian ; Tang, Zhongyi ; Zhang, Changwu ; Li, Yuhang ; Shi, Yang ; Liu, Hengzhu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c149t-341d3844aa518bacb178fa2f90e44c26f33b981bde17a28eee6115f8df239dfb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Arrays</topic><topic>Artificial neural networks</topic><topic>Computer architecture</topic><topic>Dataflow</topic><topic>deep neural network (DNN) accelerator</topic><topic>Delays</topic><topic>design space exploration</topic><topic>Energy consumption</topic><topic>Hardware</topic><topic>mapping space exploration</topic><topic>Space exploration</topic><topic>System-on-chip</topic><topic>systolic array</topic><topic>Systolic arrays</topic><topic>Tensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Hao</creatorcontrib><creatorcontrib>Shen, Junzhong</creatorcontrib><creatorcontrib>Zhang, Tian</creatorcontrib><creatorcontrib>Tang, Zhongyi</creatorcontrib><creatorcontrib>Zhang, Changwu</creatorcontrib><creatorcontrib>Li, Yuhang</creatorcontrib><creatorcontrib>Shi, Yang</creatorcontrib><creatorcontrib>Liu, Hengzhu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sun, Hao</au><au>Shen, Junzhong</au><au>Zhang, Tian</au><au>Tang, Zhongyi</au><au>Zhang, Changwu</au><au>Li, Yuhang</au><au>Shi, Yang</au><au>Liu, Hengzhu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2025-01-16</date><risdate>2025</risdate><spage>1</spage><epage>14</epage><pages>1-14</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract>In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively.</abstract><pub>IEEE</pub><doi>10.1109/TVLSI.2024.3522326</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-5786-3171</orcidid><orcidid>https://orcid.org/0000-0002-2938-2714</orcidid><orcidid>https://orcid.org/0000-0002-2918-8647</orcidid><orcidid>https://orcid.org/0000-0001-6233-6800</orcidid><orcidid>https://orcid.org/0000-0003-4912-0364</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1063-8210
ispartof IEEE transactions on very large scale integration (VLSI) systems, 2025-01, p.1-14
issn 1063-8210
1557-9999
language eng
recordid cdi_ieee_primary_10843963
source IEEE Electronic Library (IEL)
subjects Arrays
Artificial neural networks
Computer architecture
Dataflow
deep neural network (DNN) accelerator
Delays
design space exploration
Energy consumption
Hardware
mapping space exploration
Space exploration
System-on-chip
systolic array
Systolic arrays
Tensors
title FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T13%3A20%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FAMS:%20A%20FrAmework%20of%20Memory-Centric%20Mapping%20for%20DNNs%20on%20Systolic%20Array%20Accelerators&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Sun,%20Hao&rft.date=2025-01-16&rft.spage=1&rft.epage=14&rft.pages=1-14&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2024.3522326&rft_dat=%3Ccrossref_RIE%3E10_1109_TVLSI_2024_3522326%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10843963&rfr_iscdi=true