IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro
Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization an...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on very large scale integration (VLSI) systems 2024-02, Vol.32 (2), p.256-268 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 268 |
---|---|
container_issue | 2 |
container_start_page | 256 |
container_title | IEEE transactions on very large scale integration (VLSI) systems |
container_volume | 32 |
creator | Chang, Liang Zhao, Xin Yue, Ting Yang, Xi Li, Chenglong Lin, Shuisheng Zhou, Jun |
description | Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization and low CIM capacity without considering the weight update strategy of CIM architecture. The artificial intelligence (AI) processor with a CIM engine practically induces issues, including updating memory data and supporting different operators. For instance, AI-oriented applications usually contain various weight parameters. The weight stored in the CIM architecture should be reloaded for the considerable gap between the capacity of CIM and growing weight parameters. The computation efficiency of the CIM architecture is reduced by the weight updating and waiting. In addition, the natural parallelism of CIM leads to the mismatch of various convolution kernel sizes in different networks and layers, which reduces hardware utilization efficiency. In this work, we develop a CIM engine with a ping-pong computing strategy as an alternative to typical CIM macro and weight buffer, hiding the data update latency and improving the data reuse ratio. Based on the ping-pong engine, we propose a flexible CIM architecture adapting to different sizes of neural networks, namely, intelligent pong computing-in memory (IPOCIM), with a fine-grained data flow mapping strategy. Based on the evaluation, IPOCIM can achieve a 1.27- 6.27\times performance and 2.34- 5.30\times energy efficiency improvement compared to the state-of-the-art works. |
doi_str_mv | 10.1109/TVLSI.2023.3330648 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TVLSI_2023_3330648</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10323361</ieee_id><sourcerecordid>2918029004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c247t-95d1fbc3783d6866887de55f492483996776bf0a83f28fd622d587edbdacb38a3</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRSMEEqXwA4iFJdYpfiSOzQ6FApFatVILLCPHcVJXqR0cB9G_J6VdMJt56N6Z0QmCWwQnCEH-sP6YrbIJhphMCCGQRuwsGKE4TkI-xPlQQ0pChhG8DK66bgshiiIOR8F3tlyk2fwRPDmvKy21aEBmvGoaXSvjh7HcaK-k750Cz6rTtQGrVkgFpj9tY53w2hrwqf0GrKRoRNEosNSmDpfW1CC1u7b3h1abcK521u3BXEhnr4OLSjSdujnlcfD-Ml2nb-Fs8ZqlT7NQ4ijxIY9LVBWSJIyUlFHKWFKqOK4ijiNGOKdJQosKCkYqzKqSYlzGLFFlUQpZECbIOLg_7m2d_epV5_Ot7Z0ZTuaYIwYxhzAaVPioGj7rOqeqvHV6J9w-RzA_8M3_-OYHvvmJ72C6O5q0UuqfgWBCKCK_Npt3hA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918029004</pqid></control><display><type>article</type><title>IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro</title><source>IEEE Electronic Library (IEL)</source><creator>Chang, Liang ; Zhao, Xin ; Yue, Ting ; Yang, Xi ; Li, Chenglong ; Lin, Shuisheng ; Zhou, Jun</creator><creatorcontrib>Chang, Liang ; Zhao, Xin ; Yue, Ting ; Yang, Xi ; Li, Chenglong ; Lin, Shuisheng ; Zhou, Jun</creatorcontrib><description><![CDATA[Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization and low CIM capacity without considering the weight update strategy of CIM architecture. The artificial intelligence (AI) processor with a CIM engine practically induces issues, including updating memory data and supporting different operators. For instance, AI-oriented applications usually contain various weight parameters. The weight stored in the CIM architecture should be reloaded for the considerable gap between the capacity of CIM and growing weight parameters. The computation efficiency of the CIM architecture is reduced by the weight updating and waiting. In addition, the natural parallelism of CIM leads to the mismatch of various convolution kernel sizes in different networks and layers, which reduces hardware utilization efficiency. In this work, we develop a CIM engine with a ping-pong computing strategy as an alternative to typical CIM macro and weight buffer, hiding the data update latency and improving the data reuse ratio. Based on the ping-pong engine, we propose a flexible CIM architecture adapting to different sizes of neural networks, namely, intelligent pong computing-in memory (IPOCIM), with a fine-grained data flow mapping strategy. Based on the evaluation, IPOCIM can achieve a 1.27-<inline-formula> <tex-math notation="LaTeX">6.27\times </tex-math></inline-formula> performance and 2.34-<inline-formula> <tex-math notation="LaTeX">5.30\times </tex-math></inline-formula> energy efficiency improvement compared to the state-of-the-art works.]]></description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2023.3330648</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Artificial intelligence ; Artificial intelligent processor ; Computer architecture ; Computer memory ; Computing time ; computing-in-memory (CIM) ; Energy efficiency ; Flow mapping ; In-memory computing ; Memory architecture ; Memory management ; Microprocessors ; Network latency ; Neural networks ; Parameters ; ping-pong computing ; Table tennis</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2024-02, Vol.32 (2), p.256-268</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c247t-95d1fbc3783d6866887de55f492483996776bf0a83f28fd622d587edbdacb38a3</cites><orcidid>0009-0006-2067-1591 ; 0000-0003-2296-8146 ; 0000-0002-6685-5576 ; 0000-0003-0111-0749 ; 0000-0003-4623-9829 ; 0000-0003-2098-9621</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10323361$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10323361$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chang, Liang</creatorcontrib><creatorcontrib>Zhao, Xin</creatorcontrib><creatorcontrib>Yue, Ting</creatorcontrib><creatorcontrib>Yang, Xi</creatorcontrib><creatorcontrib>Li, Chenglong</creatorcontrib><creatorcontrib>Lin, Shuisheng</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><title>IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description><![CDATA[Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization and low CIM capacity without considering the weight update strategy of CIM architecture. The artificial intelligence (AI) processor with a CIM engine practically induces issues, including updating memory data and supporting different operators. For instance, AI-oriented applications usually contain various weight parameters. The weight stored in the CIM architecture should be reloaded for the considerable gap between the capacity of CIM and growing weight parameters. The computation efficiency of the CIM architecture is reduced by the weight updating and waiting. In addition, the natural parallelism of CIM leads to the mismatch of various convolution kernel sizes in different networks and layers, which reduces hardware utilization efficiency. In this work, we develop a CIM engine with a ping-pong computing strategy as an alternative to typical CIM macro and weight buffer, hiding the data update latency and improving the data reuse ratio. Based on the ping-pong engine, we propose a flexible CIM architecture adapting to different sizes of neural networks, namely, intelligent pong computing-in memory (IPOCIM), with a fine-grained data flow mapping strategy. Based on the evaluation, IPOCIM can achieve a 1.27-<inline-formula> <tex-math notation="LaTeX">6.27\times </tex-math></inline-formula> performance and 2.34-<inline-formula> <tex-math notation="LaTeX">5.30\times </tex-math></inline-formula> energy efficiency improvement compared to the state-of-the-art works.]]></description><subject>Artificial intelligence</subject><subject>Artificial intelligent processor</subject><subject>Computer architecture</subject><subject>Computer memory</subject><subject>Computing time</subject><subject>computing-in-memory (CIM)</subject><subject>Energy efficiency</subject><subject>Flow mapping</subject><subject>In-memory computing</subject><subject>Memory architecture</subject><subject>Memory management</subject><subject>Microprocessors</subject><subject>Network latency</subject><subject>Neural networks</subject><subject>Parameters</subject><subject>ping-pong computing</subject><subject>Table tennis</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtOwzAQRSMEEqXwA4iFJdYpfiSOzQ6FApFatVILLCPHcVJXqR0cB9G_J6VdMJt56N6Z0QmCWwQnCEH-sP6YrbIJhphMCCGQRuwsGKE4TkI-xPlQQ0pChhG8DK66bgshiiIOR8F3tlyk2fwRPDmvKy21aEBmvGoaXSvjh7HcaK-k750Cz6rTtQGrVkgFpj9tY53w2hrwqf0GrKRoRNEosNSmDpfW1CC1u7b3h1abcK521u3BXEhnr4OLSjSdujnlcfD-Ml2nb-Fs8ZqlT7NQ4ijxIY9LVBWSJIyUlFHKWFKqOK4ijiNGOKdJQosKCkYqzKqSYlzGLFFlUQpZECbIOLg_7m2d_epV5_Ot7Z0ZTuaYIwYxhzAaVPioGj7rOqeqvHV6J9w-RzA_8M3_-OYHvvmJ72C6O5q0UuqfgWBCKCK_Npt3hA</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Chang, Liang</creator><creator>Zhao, Xin</creator><creator>Yue, Ting</creator><creator>Yang, Xi</creator><creator>Li, Chenglong</creator><creator>Lin, Shuisheng</creator><creator>Zhou, Jun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0009-0006-2067-1591</orcidid><orcidid>https://orcid.org/0000-0003-2296-8146</orcidid><orcidid>https://orcid.org/0000-0002-6685-5576</orcidid><orcidid>https://orcid.org/0000-0003-0111-0749</orcidid><orcidid>https://orcid.org/0000-0003-4623-9829</orcidid><orcidid>https://orcid.org/0000-0003-2098-9621</orcidid></search><sort><creationdate>20240201</creationdate><title>IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro</title><author>Chang, Liang ; Zhao, Xin ; Yue, Ting ; Yang, Xi ; Li, Chenglong ; Lin, Shuisheng ; Zhou, Jun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c247t-95d1fbc3783d6866887de55f492483996776bf0a83f28fd622d587edbdacb38a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial intelligence</topic><topic>Artificial intelligent processor</topic><topic>Computer architecture</topic><topic>Computer memory</topic><topic>Computing time</topic><topic>computing-in-memory (CIM)</topic><topic>Energy efficiency</topic><topic>Flow mapping</topic><topic>In-memory computing</topic><topic>Memory architecture</topic><topic>Memory management</topic><topic>Microprocessors</topic><topic>Network latency</topic><topic>Neural networks</topic><topic>Parameters</topic><topic>ping-pong computing</topic><topic>Table tennis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chang, Liang</creatorcontrib><creatorcontrib>Zhao, Xin</creatorcontrib><creatorcontrib>Yue, Ting</creatorcontrib><creatorcontrib>Yang, Xi</creatorcontrib><creatorcontrib>Li, Chenglong</creatorcontrib><creatorcontrib>Lin, Shuisheng</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chang, Liang</au><au>Zhao, Xin</au><au>Yue, Ting</au><au>Yang, Xi</au><au>Li, Chenglong</au><au>Lin, Shuisheng</au><au>Zhou, Jun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>32</volume><issue>2</issue><spage>256</spage><epage>268</epage><pages>256-268</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract><![CDATA[Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization and low CIM capacity without considering the weight update strategy of CIM architecture. The artificial intelligence (AI) processor with a CIM engine practically induces issues, including updating memory data and supporting different operators. For instance, AI-oriented applications usually contain various weight parameters. The weight stored in the CIM architecture should be reloaded for the considerable gap between the capacity of CIM and growing weight parameters. The computation efficiency of the CIM architecture is reduced by the weight updating and waiting. In addition, the natural parallelism of CIM leads to the mismatch of various convolution kernel sizes in different networks and layers, which reduces hardware utilization efficiency. In this work, we develop a CIM engine with a ping-pong computing strategy as an alternative to typical CIM macro and weight buffer, hiding the data update latency and improving the data reuse ratio. Based on the ping-pong engine, we propose a flexible CIM architecture adapting to different sizes of neural networks, namely, intelligent pong computing-in memory (IPOCIM), with a fine-grained data flow mapping strategy. Based on the evaluation, IPOCIM can achieve a 1.27-<inline-formula> <tex-math notation="LaTeX">6.27\times </tex-math></inline-formula> performance and 2.34-<inline-formula> <tex-math notation="LaTeX">5.30\times </tex-math></inline-formula> energy efficiency improvement compared to the state-of-the-art works.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TVLSI.2023.3330648</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0006-2067-1591</orcidid><orcidid>https://orcid.org/0000-0003-2296-8146</orcidid><orcidid>https://orcid.org/0000-0002-6685-5576</orcidid><orcidid>https://orcid.org/0000-0003-0111-0749</orcidid><orcidid>https://orcid.org/0000-0003-4623-9829</orcidid><orcidid>https://orcid.org/0000-0003-2098-9621</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1063-8210 |
ispartof | IEEE transactions on very large scale integration (VLSI) systems, 2024-02, Vol.32 (2), p.256-268 |
issn | 1063-8210 1557-9999 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TVLSI_2023_3330648 |
source | IEEE Electronic Library (IEL) |
subjects | Artificial intelligence Artificial intelligent processor Computer architecture Computer memory Computing time computing-in-memory (CIM) Energy efficiency Flow mapping In-memory computing Memory architecture Memory management Microprocessors Network latency Neural networks Parameters ping-pong computing Table tennis |
title | IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T07%3A04%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=IPOCIM:%20Artificial%20Intelligent%20Architecture%20Design%20Space%20Exploration%20With%20Scalable%20Ping-Pong%20Computing-in-Memory%20Macro&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Chang,%20Liang&rft.date=2024-02-01&rft.volume=32&rft.issue=2&rft.spage=256&rft.epage=268&rft.pages=256-268&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2023.3330648&rft_dat=%3Cproquest_RIE%3E2918029004%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918029004&rft_id=info:pmid/&rft_ieee_id=10323361&rfr_iscdi=true |