IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro

Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on very large scale integration (VLSI) systems 2024-02, Vol.32 (2), p.256-268
Hauptverfasser: Chang, Liang, Zhao, Xin, Yue, Ting, Yang, Xi, Li, Chenglong, Lin, Shuisheng, Zhou, Jun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 268
container_issue 2
container_start_page 256
container_title IEEE transactions on very large scale integration (VLSI) systems
container_volume 32
creator Chang, Liang
Zhao, Xin
Yue, Ting
Yang, Xi
Li, Chenglong
Lin, Shuisheng
Zhou, Jun
description Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization and low CIM capacity without considering the weight update strategy of CIM architecture. The artificial intelligence (AI) processor with a CIM engine practically induces issues, including updating memory data and supporting different operators. For instance, AI-oriented applications usually contain various weight parameters. The weight stored in the CIM architecture should be reloaded for the considerable gap between the capacity of CIM and growing weight parameters. The computation efficiency of the CIM architecture is reduced by the weight updating and waiting. In addition, the natural parallelism of CIM leads to the mismatch of various convolution kernel sizes in different networks and layers, which reduces hardware utilization efficiency. In this work, we develop a CIM engine with a ping-pong computing strategy as an alternative to typical CIM macro and weight buffer, hiding the data update latency and improving the data reuse ratio. Based on the ping-pong engine, we propose a flexible CIM architecture adapting to different sizes of neural networks, namely, intelligent pong computing-in memory (IPOCIM), with a fine-grained data flow mapping strategy. Based on the evaluation, IPOCIM can achieve a 1.27- 6.27\times performance and 2.34- 5.30\times energy efficiency improvement compared to the state-of-the-art works.
doi_str_mv 10.1109/TVLSI.2023.3330648
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TVLSI_2023_3330648</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10323361</ieee_id><sourcerecordid>2918029004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c247t-95d1fbc3783d6866887de55f492483996776bf0a83f28fd622d587edbdacb38a3</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRSMEEqXwA4iFJdYpfiSOzQ6FApFatVILLCPHcVJXqR0cB9G_J6VdMJt56N6Z0QmCWwQnCEH-sP6YrbIJhphMCCGQRuwsGKE4TkI-xPlQQ0pChhG8DK66bgshiiIOR8F3tlyk2fwRPDmvKy21aEBmvGoaXSvjh7HcaK-k750Cz6rTtQGrVkgFpj9tY53w2hrwqf0GrKRoRNEosNSmDpfW1CC1u7b3h1abcK521u3BXEhnr4OLSjSdujnlcfD-Ml2nb-Fs8ZqlT7NQ4ijxIY9LVBWSJIyUlFHKWFKqOK4ijiNGOKdJQosKCkYqzKqSYlzGLFFlUQpZECbIOLg_7m2d_epV5_Ot7Z0ZTuaYIwYxhzAaVPioGj7rOqeqvHV6J9w-RzA_8M3_-OYHvvmJ72C6O5q0UuqfgWBCKCK_Npt3hA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918029004</pqid></control><display><type>article</type><title>IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro</title><source>IEEE Electronic Library (IEL)</source><creator>Chang, Liang ; Zhao, Xin ; Yue, Ting ; Yang, Xi ; Li, Chenglong ; Lin, Shuisheng ; Zhou, Jun</creator><creatorcontrib>Chang, Liang ; Zhao, Xin ; Yue, Ting ; Yang, Xi ; Li, Chenglong ; Lin, Shuisheng ; Zhou, Jun</creatorcontrib><description><![CDATA[Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization and low CIM capacity without considering the weight update strategy of CIM architecture. The artificial intelligence (AI) processor with a CIM engine practically induces issues, including updating memory data and supporting different operators. For instance, AI-oriented applications usually contain various weight parameters. The weight stored in the CIM architecture should be reloaded for the considerable gap between the capacity of CIM and growing weight parameters. The computation efficiency of the CIM architecture is reduced by the weight updating and waiting. In addition, the natural parallelism of CIM leads to the mismatch of various convolution kernel sizes in different networks and layers, which reduces hardware utilization efficiency. In this work, we develop a CIM engine with a ping-pong computing strategy as an alternative to typical CIM macro and weight buffer, hiding the data update latency and improving the data reuse ratio. Based on the ping-pong engine, we propose a flexible CIM architecture adapting to different sizes of neural networks, namely, intelligent pong computing-in memory (IPOCIM), with a fine-grained data flow mapping strategy. Based on the evaluation, IPOCIM can achieve a 1.27-<inline-formula> <tex-math notation="LaTeX">6.27\times </tex-math></inline-formula> performance and 2.34-<inline-formula> <tex-math notation="LaTeX">5.30\times </tex-math></inline-formula> energy efficiency improvement compared to the state-of-the-art works.]]></description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2023.3330648</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Artificial intelligence ; Artificial intelligent processor ; Computer architecture ; Computer memory ; Computing time ; computing-in-memory (CIM) ; Energy efficiency ; Flow mapping ; In-memory computing ; Memory architecture ; Memory management ; Microprocessors ; Network latency ; Neural networks ; Parameters ; ping-pong computing ; Table tennis</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2024-02, Vol.32 (2), p.256-268</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c247t-95d1fbc3783d6866887de55f492483996776bf0a83f28fd622d587edbdacb38a3</cites><orcidid>0009-0006-2067-1591 ; 0000-0003-2296-8146 ; 0000-0002-6685-5576 ; 0000-0003-0111-0749 ; 0000-0003-4623-9829 ; 0000-0003-2098-9621</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10323361$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10323361$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chang, Liang</creatorcontrib><creatorcontrib>Zhao, Xin</creatorcontrib><creatorcontrib>Yue, Ting</creatorcontrib><creatorcontrib>Yang, Xi</creatorcontrib><creatorcontrib>Li, Chenglong</creatorcontrib><creatorcontrib>Lin, Shuisheng</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><title>IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description><![CDATA[Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization and low CIM capacity without considering the weight update strategy of CIM architecture. The artificial intelligence (AI) processor with a CIM engine practically induces issues, including updating memory data and supporting different operators. For instance, AI-oriented applications usually contain various weight parameters. The weight stored in the CIM architecture should be reloaded for the considerable gap between the capacity of CIM and growing weight parameters. The computation efficiency of the CIM architecture is reduced by the weight updating and waiting. In addition, the natural parallelism of CIM leads to the mismatch of various convolution kernel sizes in different networks and layers, which reduces hardware utilization efficiency. In this work, we develop a CIM engine with a ping-pong computing strategy as an alternative to typical CIM macro and weight buffer, hiding the data update latency and improving the data reuse ratio. Based on the ping-pong engine, we propose a flexible CIM architecture adapting to different sizes of neural networks, namely, intelligent pong computing-in memory (IPOCIM), with a fine-grained data flow mapping strategy. Based on the evaluation, IPOCIM can achieve a 1.27-<inline-formula> <tex-math notation="LaTeX">6.27\times </tex-math></inline-formula> performance and 2.34-<inline-formula> <tex-math notation="LaTeX">5.30\times </tex-math></inline-formula> energy efficiency improvement compared to the state-of-the-art works.]]></description><subject>Artificial intelligence</subject><subject>Artificial intelligent processor</subject><subject>Computer architecture</subject><subject>Computer memory</subject><subject>Computing time</subject><subject>computing-in-memory (CIM)</subject><subject>Energy efficiency</subject><subject>Flow mapping</subject><subject>In-memory computing</subject><subject>Memory architecture</subject><subject>Memory management</subject><subject>Microprocessors</subject><subject>Network latency</subject><subject>Neural networks</subject><subject>Parameters</subject><subject>ping-pong computing</subject><subject>Table tennis</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtOwzAQRSMEEqXwA4iFJdYpfiSOzQ6FApFatVILLCPHcVJXqR0cB9G_J6VdMJt56N6Z0QmCWwQnCEH-sP6YrbIJhphMCCGQRuwsGKE4TkI-xPlQQ0pChhG8DK66bgshiiIOR8F3tlyk2fwRPDmvKy21aEBmvGoaXSvjh7HcaK-k750Cz6rTtQGrVkgFpj9tY53w2hrwqf0GrKRoRNEosNSmDpfW1CC1u7b3h1abcK521u3BXEhnr4OLSjSdujnlcfD-Ml2nb-Fs8ZqlT7NQ4ijxIY9LVBWSJIyUlFHKWFKqOK4ijiNGOKdJQosKCkYqzKqSYlzGLFFlUQpZECbIOLg_7m2d_epV5_Ot7Z0ZTuaYIwYxhzAaVPioGj7rOqeqvHV6J9w-RzA_8M3_-OYHvvmJ72C6O5q0UuqfgWBCKCK_Npt3hA</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Chang, Liang</creator><creator>Zhao, Xin</creator><creator>Yue, Ting</creator><creator>Yang, Xi</creator><creator>Li, Chenglong</creator><creator>Lin, Shuisheng</creator><creator>Zhou, Jun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0009-0006-2067-1591</orcidid><orcidid>https://orcid.org/0000-0003-2296-8146</orcidid><orcidid>https://orcid.org/0000-0002-6685-5576</orcidid><orcidid>https://orcid.org/0000-0003-0111-0749</orcidid><orcidid>https://orcid.org/0000-0003-4623-9829</orcidid><orcidid>https://orcid.org/0000-0003-2098-9621</orcidid></search><sort><creationdate>20240201</creationdate><title>IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro</title><author>Chang, Liang ; Zhao, Xin ; Yue, Ting ; Yang, Xi ; Li, Chenglong ; Lin, Shuisheng ; Zhou, Jun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c247t-95d1fbc3783d6866887de55f492483996776bf0a83f28fd622d587edbdacb38a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial intelligence</topic><topic>Artificial intelligent processor</topic><topic>Computer architecture</topic><topic>Computer memory</topic><topic>Computing time</topic><topic>computing-in-memory (CIM)</topic><topic>Energy efficiency</topic><topic>Flow mapping</topic><topic>In-memory computing</topic><topic>Memory architecture</topic><topic>Memory management</topic><topic>Microprocessors</topic><topic>Network latency</topic><topic>Neural networks</topic><topic>Parameters</topic><topic>ping-pong computing</topic><topic>Table tennis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chang, Liang</creatorcontrib><creatorcontrib>Zhao, Xin</creatorcontrib><creatorcontrib>Yue, Ting</creatorcontrib><creatorcontrib>Yang, Xi</creatorcontrib><creatorcontrib>Li, Chenglong</creatorcontrib><creatorcontrib>Lin, Shuisheng</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chang, Liang</au><au>Zhao, Xin</au><au>Yue, Ting</au><au>Yang, Xi</au><au>Li, Chenglong</au><au>Lin, Shuisheng</au><au>Zhou, Jun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>32</volume><issue>2</issue><spage>256</spage><epage>268</epage><pages>256-268</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract><![CDATA[Computing-in-memory (CIM) architecture has become a possible solution to designing an energy-efficient artificial intelligent processor. Various CIM demonstrators indicated the computing efficiency of CIM macro and CIM-based processors. However, previous studies mainly focus on macro optimization and low CIM capacity without considering the weight update strategy of CIM architecture. The artificial intelligence (AI) processor with a CIM engine practically induces issues, including updating memory data and supporting different operators. For instance, AI-oriented applications usually contain various weight parameters. The weight stored in the CIM architecture should be reloaded for the considerable gap between the capacity of CIM and growing weight parameters. The computation efficiency of the CIM architecture is reduced by the weight updating and waiting. In addition, the natural parallelism of CIM leads to the mismatch of various convolution kernel sizes in different networks and layers, which reduces hardware utilization efficiency. In this work, we develop a CIM engine with a ping-pong computing strategy as an alternative to typical CIM macro and weight buffer, hiding the data update latency and improving the data reuse ratio. Based on the ping-pong engine, we propose a flexible CIM architecture adapting to different sizes of neural networks, namely, intelligent pong computing-in memory (IPOCIM), with a fine-grained data flow mapping strategy. Based on the evaluation, IPOCIM can achieve a 1.27-<inline-formula> <tex-math notation="LaTeX">6.27\times </tex-math></inline-formula> performance and 2.34-<inline-formula> <tex-math notation="LaTeX">5.30\times </tex-math></inline-formula> energy efficiency improvement compared to the state-of-the-art works.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TVLSI.2023.3330648</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0006-2067-1591</orcidid><orcidid>https://orcid.org/0000-0003-2296-8146</orcidid><orcidid>https://orcid.org/0000-0002-6685-5576</orcidid><orcidid>https://orcid.org/0000-0003-0111-0749</orcidid><orcidid>https://orcid.org/0000-0003-4623-9829</orcidid><orcidid>https://orcid.org/0000-0003-2098-9621</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1063-8210
ispartof IEEE transactions on very large scale integration (VLSI) systems, 2024-02, Vol.32 (2), p.256-268
issn 1063-8210
1557-9999
language eng
recordid cdi_crossref_primary_10_1109_TVLSI_2023_3330648
source IEEE Electronic Library (IEL)
subjects Artificial intelligence
Artificial intelligent processor
Computer architecture
Computer memory
Computing time
computing-in-memory (CIM)
Energy efficiency
Flow mapping
In-memory computing
Memory architecture
Memory management
Microprocessors
Network latency
Neural networks
Parameters
ping-pong computing
Table tennis
title IPOCIM: Artificial Intelligent Architecture Design Space Exploration With Scalable Ping-Pong Computing-in-Memory Macro
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T07%3A04%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=IPOCIM:%20Artificial%20Intelligent%20Architecture%20Design%20Space%20Exploration%20With%20Scalable%20Ping-Pong%20Computing-in-Memory%20Macro&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Chang,%20Liang&rft.date=2024-02-01&rft.volume=32&rft.issue=2&rft.spage=256&rft.epage=268&rft.pages=256-268&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2023.3330648&rft_dat=%3Cproquest_RIE%3E2918029004%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918029004&rft_id=info:pmid/&rft_ieee_id=10323361&rfr_iscdi=true