Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration
The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on computer-aided design of integrated circuits and systems 2024-09, Vol.43 (9), p.2661-2673 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2673 |
---|---|
container_issue | 9 |
container_start_page | 2661 |
container_title | IEEE transactions on computer-aided design of integrated circuits and systems |
container_volume | 43 |
creator | Ryu, Sungju Jang, Jaeyong Oh, Youngtaek Kim, Jae-Joon |
description | The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows 1.4\times - 29.5\times higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board. |
doi_str_mv | 10.1109/TCAD.2024.3380555 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10477545</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10477545</ieee_id><sourcerecordid>3096079467</sourcerecordid><originalsourceid>FETCH-LOGICAL-c246t-e518236b192aa0e1276f35f89f49c86b6d8f0b00673b1eafbdc19e84a933ce9c3</originalsourceid><addsrcrecordid>eNpNkD1PwzAURS0EEqXwA5AYLDGnPMd2HLNFKV9SgYEixuC4z2qqkBTHUcW_JyUdmN5y7r16h5BLBjPGQN8s82w-iyEWM85TkFIekQnTXEWCSXZMJhCrNAJQcErOum4DwISM9YR8PrdlVePOeLyl86oLvir7gCuaebuuAtrQe6QfVVjTfG2aBmv6Fkyo2sb4Hzo3wbi63VHXejo2vWCgmbVYo__DzsmJM3WHF4c7Je_3d8v8MVq8Pjzl2SKysUhChJKlMU9KpmNjAFmsEselS7UT2qZJmaxSByVAonjJ0LhyZZnGVBjNuUVt-ZRcj71b33732IVi0_a-GSYLDjoBpcWQnRI2Uta3XefRFVtffQ2vFAyKvchiL7LYiywOIofM1ZipEPEfL5SSQvJf0ylv7Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3096079467</pqid></control><display><type>article</type><title>Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration</title><source>IEEE Electronic Library (IEL)</source><creator>Ryu, Sungju ; Jang, Jaeyong ; Oh, Youngtaek ; Kim, Jae-Joon</creator><creatorcontrib>Ryu, Sungju ; Jang, Jaeyong ; Oh, Youngtaek ; Kim, Jae-Joon</creatorcontrib><description><![CDATA[The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows <inline-formula> <tex-math notation="LaTeX">1.4\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">29.5\times </tex-math></inline-formula> higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board.]]></description><identifier>ISSN: 0278-0070</identifier><identifier>EISSN: 1937-4151</identifier><identifier>DOI: 10.1109/TCAD.2024.3380555</identifier><identifier>CODEN: ITCSDI</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Computational modeling ; Computer architecture ; Convolution ; Dataflow ; deep neural network ; depthwise convolution ; Design automation ; distributed memory ; hardware accelerator ; MobileNet ; processing element (PE) ; Random access memory ; Static random access memory ; System-on-chip</subject><ispartof>IEEE transactions on computer-aided design of integrated circuits and systems, 2024-09, Vol.43 (9), p.2661-2673</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c246t-e518236b192aa0e1276f35f89f49c86b6d8f0b00673b1eafbdc19e84a933ce9c3</cites><orcidid>0000-0001-5175-8258 ; 0000-0002-0254-391X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10477545$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10477545$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ryu, Sungju</creatorcontrib><creatorcontrib>Jang, Jaeyong</creatorcontrib><creatorcontrib>Oh, Youngtaek</creatorcontrib><creatorcontrib>Kim, Jae-Joon</creatorcontrib><title>Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration</title><title>IEEE transactions on computer-aided design of integrated circuits and systems</title><addtitle>TCAD</addtitle><description><![CDATA[The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows <inline-formula> <tex-math notation="LaTeX">1.4\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">29.5\times </tex-math></inline-formula> higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board.]]></description><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Convolution</subject><subject>Dataflow</subject><subject>deep neural network</subject><subject>depthwise convolution</subject><subject>Design automation</subject><subject>distributed memory</subject><subject>hardware accelerator</subject><subject>MobileNet</subject><subject>processing element (PE)</subject><subject>Random access memory</subject><subject>Static random access memory</subject><subject>System-on-chip</subject><issn>0278-0070</issn><issn>1937-4151</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkD1PwzAURS0EEqXwA5AYLDGnPMd2HLNFKV9SgYEixuC4z2qqkBTHUcW_JyUdmN5y7r16h5BLBjPGQN8s82w-iyEWM85TkFIekQnTXEWCSXZMJhCrNAJQcErOum4DwISM9YR8PrdlVePOeLyl86oLvir7gCuaebuuAtrQe6QfVVjTfG2aBmv6Fkyo2sb4Hzo3wbi63VHXejo2vWCgmbVYo__DzsmJM3WHF4c7Je_3d8v8MVq8Pjzl2SKysUhChJKlMU9KpmNjAFmsEselS7UT2qZJmaxSByVAonjJ0LhyZZnGVBjNuUVt-ZRcj71b33732IVi0_a-GSYLDjoBpcWQnRI2Uta3XefRFVtffQ2vFAyKvchiL7LYiywOIofM1ZipEPEfL5SSQvJf0ylv7Q</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Ryu, Sungju</creator><creator>Jang, Jaeyong</creator><creator>Oh, Youngtaek</creator><creator>Kim, Jae-Joon</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-5175-8258</orcidid><orcidid>https://orcid.org/0000-0002-0254-391X</orcidid></search><sort><creationdate>20240901</creationdate><title>Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration</title><author>Ryu, Sungju ; Jang, Jaeyong ; Oh, Youngtaek ; Kim, Jae-Joon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c246t-e518236b192aa0e1276f35f89f49c86b6d8f0b00673b1eafbdc19e84a933ce9c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Convolution</topic><topic>Dataflow</topic><topic>deep neural network</topic><topic>depthwise convolution</topic><topic>Design automation</topic><topic>distributed memory</topic><topic>hardware accelerator</topic><topic>MobileNet</topic><topic>processing element (PE)</topic><topic>Random access memory</topic><topic>Static random access memory</topic><topic>System-on-chip</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ryu, Sungju</creatorcontrib><creatorcontrib>Jang, Jaeyong</creatorcontrib><creatorcontrib>Oh, Youngtaek</creatorcontrib><creatorcontrib>Kim, Jae-Joon</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ryu, Sungju</au><au>Jang, Jaeyong</au><au>Oh, Youngtaek</au><au>Kim, Jae-Joon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration</atitle><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle><stitle>TCAD</stitle><date>2024-09-01</date><risdate>2024</risdate><volume>43</volume><issue>9</issue><spage>2661</spage><epage>2673</epage><pages>2661-2673</pages><issn>0278-0070</issn><eissn>1937-4151</eissn><coden>ITCSDI</coden><abstract><![CDATA[The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows <inline-formula> <tex-math notation="LaTeX">1.4\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">29.5\times </tex-math></inline-formula> higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCAD.2024.3380555</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-5175-8258</orcidid><orcidid>https://orcid.org/0000-0002-0254-391X</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0278-0070 |
ispartof | IEEE transactions on computer-aided design of integrated circuits and systems, 2024-09, Vol.43 (9), p.2661-2673 |
issn | 0278-0070 1937-4151 |
language | eng |
recordid | cdi_ieee_primary_10477545 |
source | IEEE Electronic Library (IEL) |
subjects | Computational modeling Computer architecture Convolution Dataflow deep neural network depthwise convolution Design automation distributed memory hardware accelerator MobileNet processing element (PE) Random access memory Static random access memory System-on-chip |
title | Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T13%3A39%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mobileware:%20Distributed%20Architecture%20With%20Channel%20Stationary%20Dataflow%20for%20MobileNet%20Acceleration&rft.jtitle=IEEE%20transactions%20on%20computer-aided%20design%20of%20integrated%20circuits%20and%20systems&rft.au=Ryu,%20Sungju&rft.date=2024-09-01&rft.volume=43&rft.issue=9&rft.spage=2661&rft.epage=2673&rft.pages=2661-2673&rft.issn=0278-0070&rft.eissn=1937-4151&rft.coden=ITCSDI&rft_id=info:doi/10.1109/TCAD.2024.3380555&rft_dat=%3Cproquest_RIE%3E3096079467%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3096079467&rft_id=info:pmid/&rft_ieee_id=10477545&rfr_iscdi=true |