IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro

Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization an...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on very large scale integration (VLSI) systems 2024-02, Vol.32 (2), p.256-268
Hauptverfasser:	Chang, Liang, Zhao, Xin, Yue, Ting, Yang, Xi, Li, Chenglong, Lin, Shuisheng, Zhou, Jun
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Artificial intelligent processor Computer architecture Computer memory Computing time computing-in-memory (CIM) Energy efficiency Flow mapping In-memory computing Memory architecture Memory management Microprocessors Network latency Neural networks Parameters ping-pong computing Table tennis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	268
container_issue	2
container_start_page	256
container_title	IEEE transactions on very large scale integration (VLSI) systems
container_volume	32
creator	Chang, Liang Zhao, Xin Yue, Ting Yang, Xi Li, Chenglong Lin, Shuisheng Zhou, Jun
description	Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization and low CIM capacity without considering the weight update strategy of CIM architecture. The artificial intelligence (AI) processor with a CIM engine practically induces issues, including updating memory data and supporting different operators. For instance, AI-oriented applications usually contain various weight parameters. The weight stored in the CIM architecture should be reloaded for the considerable gap between the capacity of CIM and growing weight parameters. The computation efficiency of the CIM architecture is reduced by the weight updating and waiting. In addition, the natural parallelism of CIM leads to the mismatch of various convolution kernel sizes in different networks and layers, which reduces hardware utilization efficiency. In this work, we develop a CIM engine with a ping-pong computing strategy as an alternative to typical CIM macro and weight buffer, hiding the data update latency and improving the data reuse ratio. Based on the ping-pong engine, we propose a flexible CIM architecture adapting to different sizes of neural networks, namely, intelligent pong computing-in memory (IPOCIM), with a fine-grained data flow mapping strategy. Based on the evaluation, IPOCIM can achieve a 1.27- 6.27\times performance and 2.34- 5.30\times energy efficiency improvement compared to the state-of-the-art works.
doi_str_mv	10.1109/TVLSI.2023.3330648
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TVLSI_2023_3330648</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10323361</ieee_id><sourcerecordid>2918029004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c247t-95d1fbc3783d6866887de55f492483996776bf0a83f28fd622d587edbdacb38a3</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRSMEEqXwA4iFJdYpfiSOzQ6FApFatVILLCPHcVJXqR0cB9G_J6VdMJt56N6Z0QmCWwQnCEH-sP6YrbIJhphMCCGQRuwsGKE4TkI-xPlQQ0pChhG8DK66bgshiiIOR8F3tlyk2fwRPDmvKy21aEBmvGoaXSvjh7HcaK-k750Cz6rTtQGrVkgFpj9tY53w2hrwqf0GrKRoRNEosNSmDpfW1CC1u7b3h1abcK521u3BXEhnr4OLSjSdujnlcfD-Ml2nb-Fs8ZqlT7NQ4ijxIY9LVBWSJIyUlFHKWFKqOK4ijiNGOKdJQosKCkYqzKqSYlzGLFFlUQpZECbIOLg_7m2d_epV5_Ot7Z0ZTuaYIwYxhzAaVPioGj7rOqeqvHV6J9w-RzA_8M3_-OYHvvmJ72C6O5q0UuqfgWBCKCK_Npt3hA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918029004</pqid></control><display><type>article</type><title>IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro</title><source>IEEE Electronic Library (IEL)</source><creator>Chang, Liang ; Zhao, Xin ; Yue, Ting ; Yang, Xi ; Li, Chenglong ; Lin, Shuisheng ; Zhou, Jun</creator><creatorcontrib>Chang, Liang ; Zhao, Xin ; Yue, Ting ; Yang, Xi ; Li, Chenglong ; Lin, Shuisheng ; Zhou, Jun</creatorcontrib><description><![CDATA[Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization and low CIM capacity without considering the weight update strategy of CIM architecture. The artificial intelligence (AI) processor with a CIM engine practically induces issues, including updating memory data and supporting different operators. For instance, AI-oriented applications usually contain various weight parameters. The weight stored in the CIM architecture should be reloaded for the considerable gap between the capacity of CIM and growing weight parameters. The computation efficiency of the CIM architecture is reduced by the weight updating and waiting. In addition, the natural parallelism of CIM leads to the mismatch of various convolution kernel sizes in different networks and layers, which reduces hardware utilization efficiency. In this work, we develop a CIM engine with a ping-pong computing strategy as an alternative to typical CIM macro and weight buffer, hiding the data update latency and improving the data reuse ratio. Based on the ping-pong engine, we propose a flexible CIM architecture adapting to different sizes of neural networks, namely, intelligent pong computing-in memory (IPOCIM), with a fine-grained data flow mapping strategy. Based on the evaluation, IPOCIM can achieve a 1.27-<inline-formula> <tex-math notation="LaTeX">6.27\times </tex-math></inline-formula> performance and 2.34-<inline-formula> <tex-math notation="LaTeX">5.30\times </tex-math></inline-formula> energy efficiency improvement compared to the state-of-the-art works.]]></description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2023.3330648</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Artificial intelligence ; Artificial intelligent processor ; Computer architecture ; Computer memory ; Computing time ; computing-in-memory (CIM) ; Energy efficiency ; Flow mapping ; In-memory computing ; Memory architecture ; Memory management ; Microprocessors ; Network latency ; Neural networks ; Parameters ; ping-pong computing ; Table tennis</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2024-02, Vol.32 (2), p.256-268</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c247t-95d1fbc3783d6866887de55f492483996776bf0a83f28fd622d587edbdacb38a3</cites><orcidid>0009-0006-2067-1591 ; 0000-0003-2296-8146 ; 0000-0002-6685-5576 ; 0000-0003-0111-0749 ; 0000-0003-4623-9829 ; 0000-0003-2098-9621</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10323361$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10323361$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chang, Liang</creatorcontrib><creatorcontrib>Zhao, Xin</creatorcontrib><creatorcontrib>Yue, Ting</creatorcontrib><creatorcontrib>Yang, Xi</creatorcontrib><creatorcontrib>Li, Chenglong</creatorcontrib><creatorcontrib>Lin, Shuisheng</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><title>IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description><![CDATA[Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization and low CIM capacity without considering the weight update strategy of CIM architecture. The artificial intelligence (AI) processor with a CIM engine practically induces issues, including updating memory data and supporting different operators. For instance, AI-oriented applications usually contain various weight parameters. The weight stored in the CIM architecture should be reloaded for the considerable gap between the capacity of CIM and growing weight parameters. The computation efficiency of the CIM architecture is reduced by the weight updating and waiting. In addition, the natural parallelism of CIM leads to the mismatch of various convolution kernel sizes in different networks and layers, which reduces hardware utilization efficiency. In this work, we develop a CIM engine with a ping-pong computing strategy as an alternative to typical CIM macro and weight buffer, hiding the data update latency and improving the data reuse ratio. Based on the ping-pong engine, we propose a flexible CIM architecture adapting to different sizes of neural networks, namely, intelligent pong computing-in memory (IPOCIM), with a fine-grained data flow mapping strategy. Based on the evaluation, IPOCIM can achieve a 1.27-<inline-formula> <tex-math notation="LaTeX">6.27\times </tex-math></inline-formula> performance and 2.34-<inline-formula> <tex-math notation="LaTeX">5.30\times </tex-math></inline-formula> energy efficiency improvement compared to the state-of-the-art works.]]></description><subject>Artificial intelligence</subject><subject>Artificial intelligent processor</subject><subject>Computer architecture</subject><subject>Computer memory</subject><subject>Computing time</subject><subject>computing-in-memory (CIM)</subject><subject>Energy efficiency</subject><subject>Flow mapping</subject><subject>In-memory computing</subject><subject>Memory architecture</subject><subject>Memory management</subject><subject>Microprocessors</subject><subject>Network latency</subject><subject>Neural networks</subject><subject>Parameters</subject><subject>ping-pong computing</subject><subject>Table tennis</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtOwzAQRSMEEqXwA4iFJdYpfiSOzQ6FApFatVILLCPHcVJXqR0cB9G_J6VdMJt56N6Z0QmCWwQnCEH-sP6YrbIJhphMCCGQRuwsGKE4TkI-xPlQQ0pChhG8DK66bgshiiIOR8F3tlyk2fwRPDmvKy21aEBmvGoaXSvjh7HcaK-k750Cz6rTtQGrVkgFpj9tY53w2hrwqf0GrKRoRNEosNSmDpfW1CC1u7b3h1abcK521u3BXEhnr4OLSjSdujnlcfD-Ml2nb-Fs8ZqlT7NQ4ijxIY9LVBWSJIyUlFHKWFKqOK4ijiNGOKdJQosKCkYqzKqSYlzGLFFlUQpZECbIOLg_7m2d_epV5_Ot7Z0ZTuaYIwYxhzAaVPioGj7rOqeqvHV6J9w-RzA_8M3_-OYHvvmJ72C6O5q0UuqfgWBCKCK_Npt3hA</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Chang, Liang</creator><creator>Zhao, Xin</creator><creator>Yue, Ting</creator><creator>Yang, Xi</creator><creator>Li, Chenglong</creator><creator>Lin, Shuisheng</creator><creator>Zhou, Jun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0009-0006-2067-1591</orcidid><orcidid>https://orcid.org/0000-0003-2296-8146</orcidid><orcidid>https://orcid.org/0000-0002-6685-5576</orcidid><orcidid>https://orcid.org/0000-0003-0111-0749</orcidid><orcidid>https://orcid.org/0000-0003-4623-9829</orcidid><orcidid>https://orcid.org/0000-0003-2098-9621</orcidid></search><sort><creationdate>20240201</creationdate><title>IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro</title><author>Chang, Liang ; Zhao, Xin ; Yue, Ting ; Yang, Xi ; Li, Chenglong ; Lin, Shuisheng ; Zhou, Jun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c247t-95d1fbc3783d6866887de55f492483996776bf0a83f28fd622d587edbdacb38a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial intelligence</topic><topic>Artificial intelligent processor</topic><topic>Computer architecture</topic><topic>Computer memory</topic><topic>Computing time</topic><topic>computing-in-memory (CIM)</topic><topic>Energy efficiency</topic><topic>Flow mapping</topic><topic>In-memory computing</topic><topic>Memory architecture</topic><topic>Memory management</topic><topic>Microprocessors</topic><topic>Network latency</topic><topic>Neural networks</topic><topic>Parameters</topic><topic>ping-pong computing</topic><topic>Table tennis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chang, Liang</creatorcontrib><creatorcontrib>Zhao, Xin</creatorcontrib><creatorcontrib>Yue, Ting</creatorcontrib><creatorcontrib>Yang, Xi</creatorcontrib><creatorcontrib>Li, Chenglong</creatorcontrib><creatorcontrib>Lin, Shuisheng</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chang, Liang</au><au>Zhao, Xin</au><au>Yue, Ting</au><au>Yang, Xi</au><au>Li, Chenglong</au><au>Lin, Shuisheng</au><au>Zhou, Jun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>32</volume><issue>2</issue><spage>256</spage><epage>268</epage><pages>256-268</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract><![CDATA[Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization and low CIM capacity without considering the weight update strategy of CIM architecture. The artificial intelligence (AI) processor with a CIM engine practically induces issues, including updating memory data and supporting different operators. For instance, AI-oriented applications usually contain various weight parameters. The weight stored in the CIM architecture should be reloaded for the considerable gap between the capacity of CIM and growing weight parameters. The computation efficiency of the CIM architecture is reduced by the weight updating and waiting. In addition, the natural parallelism of CIM leads to the mismatch of various convolution kernel sizes in different networks and layers, which reduces hardware utilization efficiency. In this work, we develop a CIM engine with a ping-pong computing strategy as an alternative to typical CIM macro and weight buffer, hiding the data update latency and improving the data reuse ratio. Based on the ping-pong engine, we propose a flexible CIM architecture adapting to different sizes of neural networks, namely, intelligent pong computing-in memory (IPOCIM), with a fine-grained data flow mapping strategy. Based on the evaluation, IPOCIM can achieve a 1.27-<inline-formula> <tex-math notation="LaTeX">6.27\times </tex-math></inline-formula> performance and 2.34-<inline-formula> <tex-math notation="LaTeX">5.30\times </tex-math></inline-formula> energy efficiency improvement compared to the state-of-the-art works.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TVLSI.2023.3330648</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0006-2067-1591</orcidid><orcidid>https://orcid.org/0000-0003-2296-8146</orcidid><orcidid>https://orcid.org/0000-0002-6685-5576</orcidid><orcidid>https://orcid.org/0000-0003-0111-0749</orcidid><orcidid>https://orcid.org/0000-0003-4623-9829</orcidid><orcidid>https://orcid.org/0000-0003-2098-9621</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1063-8210
ispartof	IEEE transactions on very large scale integration (VLSI) systems, 2024-02, Vol.32 (2), p.256-268
issn	1063-8210 1557-9999
language	eng
recordid	cdi_crossref_primary_10_1109_TVLSI_2023_3330648
source	IEEE Electronic Library (IEL)
subjects	Artificial intelligence Artificial intelligent processor Computer architecture Computer memory Computing time computing-in-memory (CIM) Energy efficiency Flow mapping In-memory computing Memory architecture Memory management Microprocessors Network latency Neural networks Parameters ping-pong computing Table tennis
title	IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T07%3A04%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=IPOCIM:%20Artificial%20Intelligent%20Architecture%20Design%20Space%20Exploration%20With%20Scalable%20Ping-Pong%20Computing-in-Memory%20Macro&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Chang,%20Liang&rft.date=2024-02-01&rft.volume=32&rft.issue=2&rft.spage=256&rft.epage=268&rft.pages=256-268&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2023.3330648&rft_dat=%3Cproquest_RIE%3E2918029004%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918029004&rft_id=info:pmid/&rft_ieee_id=10323361&rfr_iscdi=true