Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications

Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Netwo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2022-02, Vol.33 (2), p.263-275
Hauptverfasser: Sutradhar, Purab Ranjan, Bavikadi, Sathwika, Connolly, Mark, Prajapati, Savankumar, Indovina, Mark A., Dinakarrao, Sai Manoj Pudukotai, Ganguly, Amlan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 275
container_issue 2
container_start_page 263
container_title IEEE transactions on parallel and distributed systems
container_volume 33
creator Sutradhar, Purab Ranjan
Bavikadi, Sathwika
Connolly, Mark
Prajapati, Savankumar
Indovina, Mark A.
Dinakarrao, Sai Manoj Pudukotai
Ganguly, Amlan
description Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode.
doi_str_mv 10.1109/TPDS.2021.3066909
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPDS_2021_3066909</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9380930</ieee_id><sourcerecordid>2565236215</sourcerecordid><originalsourceid>FETCH-LOGICAL-c384t-438b7684f9f6738f165002ad2e465f4b20af796ff039d6ecda5d235ba9f6442c3</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhoMoWKs_QLwEPG_d72aPtX5CxUIrHsNmM9tuTbNxNwX7701s8TTDzPPOwJMk1wSPCMHqbjl_WIwopmTEsJQKq5NkQITIECUZO-16zAVSlKjz5CLGDcaEC8wHyc_M-y-0a9BSFxWk9zpCmc6DNxCjq1fI1egNtj7s00kwa9eCaXcB0k_XrntsFfR2-5ecBzAuOl-jhdFVF02tD-kDQJPOQIe6n0yapnJGtx0VL5Mzq6sIV8c6TD6eHpfTFzR7f36dTmbIsIy3iLOsGMuMW2XlmGWWSIEx1SUFLoXlBcXajpW0FjNVSjClFiVlotAdzzk1bJjcHu42wX_vILb5xu9C3b3MqZCCMkmJ6ChyoEzwMQaweRPcVod9TnDeC857wXkvOD8K7jI3h4wDgH9esazbYfYLmFJ4Fw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2565236215</pqid></control><display><type>article</type><title>Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications</title><source>IEEE Electronic Library (IEL)</source><creator>Sutradhar, Purab Ranjan ; Bavikadi, Sathwika ; Connolly, Mark ; Prajapati, Savankumar ; Indovina, Mark A. ; Dinakarrao, Sai Manoj Pudukotai ; Ganguly, Amlan</creator><creatorcontrib>Sutradhar, Purab Ranjan ; Bavikadi, Sathwika ; Connolly, Mark ; Prajapati, Savankumar ; Indovina, Mark A. ; Dinakarrao, Sai Manoj Pudukotai ; Ganguly, Amlan</creatorcontrib><description>Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2021.3066909</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Artificial neural networks ; Circuits ; Computer architecture ; convolutional neural networks (CNN) ; deep neural networks (DNN) ; Dynamic random access memory ; Efficiency ; Logic circuits ; look-up table (LUT) ; Lookup tables ; Machine learning ; Network latency ; Neural networks ; Optimization ; Parallel processing ; Performance evaluation ; Processing in memory (PIM) ; Random access memory ; Registers ; Table lookup ; Transmission gates</subject><ispartof>IEEE transactions on parallel and distributed systems, 2022-02, Vol.33 (2), p.263-275</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c384t-438b7684f9f6738f165002ad2e465f4b20af796ff039d6ecda5d235ba9f6442c3</citedby><cites>FETCH-LOGICAL-c384t-438b7684f9f6738f165002ad2e465f4b20af796ff039d6ecda5d235ba9f6442c3</cites><orcidid>0000-0002-1430-5070 ; 0000-0002-4417-2387 ; 0000-0001-7354-7873 ; 0000-0003-3155-1115</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9380930$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9380930$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sutradhar, Purab Ranjan</creatorcontrib><creatorcontrib>Bavikadi, Sathwika</creatorcontrib><creatorcontrib>Connolly, Mark</creatorcontrib><creatorcontrib>Prajapati, Savankumar</creatorcontrib><creatorcontrib>Indovina, Mark A.</creatorcontrib><creatorcontrib>Dinakarrao, Sai Manoj Pudukotai</creatorcontrib><creatorcontrib>Ganguly, Amlan</creatorcontrib><title>Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode.</description><subject>Artificial neural networks</subject><subject>Circuits</subject><subject>Computer architecture</subject><subject>convolutional neural networks (CNN)</subject><subject>deep neural networks (DNN)</subject><subject>Dynamic random access memory</subject><subject>Efficiency</subject><subject>Logic circuits</subject><subject>look-up table (LUT)</subject><subject>Lookup tables</subject><subject>Machine learning</subject><subject>Network latency</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Parallel processing</subject><subject>Performance evaluation</subject><subject>Processing in memory (PIM)</subject><subject>Random access memory</subject><subject>Registers</subject><subject>Table lookup</subject><subject>Transmission gates</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AQhoMoWKs_QLwEPG_d72aPtX5CxUIrHsNmM9tuTbNxNwX7701s8TTDzPPOwJMk1wSPCMHqbjl_WIwopmTEsJQKq5NkQITIECUZO-16zAVSlKjz5CLGDcaEC8wHyc_M-y-0a9BSFxWk9zpCmc6DNxCjq1fI1egNtj7s00kwa9eCaXcB0k_XrntsFfR2-5ecBzAuOl-jhdFVF02tD-kDQJPOQIe6n0yapnJGtx0VL5Mzq6sIV8c6TD6eHpfTFzR7f36dTmbIsIy3iLOsGMuMW2XlmGWWSIEx1SUFLoXlBcXajpW0FjNVSjClFiVlotAdzzk1bJjcHu42wX_vILb5xu9C3b3MqZCCMkmJ6ChyoEzwMQaweRPcVod9TnDeC857wXkvOD8K7jI3h4wDgH9esazbYfYLmFJ4Fw</recordid><startdate>20220201</startdate><enddate>20220201</enddate><creator>Sutradhar, Purab Ranjan</creator><creator>Bavikadi, Sathwika</creator><creator>Connolly, Mark</creator><creator>Prajapati, Savankumar</creator><creator>Indovina, Mark A.</creator><creator>Dinakarrao, Sai Manoj Pudukotai</creator><creator>Ganguly, Amlan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-1430-5070</orcidid><orcidid>https://orcid.org/0000-0002-4417-2387</orcidid><orcidid>https://orcid.org/0000-0001-7354-7873</orcidid><orcidid>https://orcid.org/0000-0003-3155-1115</orcidid></search><sort><creationdate>20220201</creationdate><title>Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications</title><author>Sutradhar, Purab Ranjan ; Bavikadi, Sathwika ; Connolly, Mark ; Prajapati, Savankumar ; Indovina, Mark A. ; Dinakarrao, Sai Manoj Pudukotai ; Ganguly, Amlan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c384t-438b7684f9f6738f165002ad2e465f4b20af796ff039d6ecda5d235ba9f6442c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial neural networks</topic><topic>Circuits</topic><topic>Computer architecture</topic><topic>convolutional neural networks (CNN)</topic><topic>deep neural networks (DNN)</topic><topic>Dynamic random access memory</topic><topic>Efficiency</topic><topic>Logic circuits</topic><topic>look-up table (LUT)</topic><topic>Lookup tables</topic><topic>Machine learning</topic><topic>Network latency</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Parallel processing</topic><topic>Performance evaluation</topic><topic>Processing in memory (PIM)</topic><topic>Random access memory</topic><topic>Registers</topic><topic>Table lookup</topic><topic>Transmission gates</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sutradhar, Purab Ranjan</creatorcontrib><creatorcontrib>Bavikadi, Sathwika</creatorcontrib><creatorcontrib>Connolly, Mark</creatorcontrib><creatorcontrib>Prajapati, Savankumar</creatorcontrib><creatorcontrib>Indovina, Mark A.</creatorcontrib><creatorcontrib>Dinakarrao, Sai Manoj Pudukotai</creatorcontrib><creatorcontrib>Ganguly, Amlan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sutradhar, Purab Ranjan</au><au>Bavikadi, Sathwika</au><au>Connolly, Mark</au><au>Prajapati, Savankumar</au><au>Indovina, Mark A.</au><au>Dinakarrao, Sai Manoj Pudukotai</au><au>Ganguly, Amlan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2022-02-01</date><risdate>2022</risdate><volume>33</volume><issue>2</issue><spage>263</spage><epage>275</epage><pages>263-275</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2021.3066909</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-1430-5070</orcidid><orcidid>https://orcid.org/0000-0002-4417-2387</orcidid><orcidid>https://orcid.org/0000-0001-7354-7873</orcidid><orcidid>https://orcid.org/0000-0003-3155-1115</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1045-9219
ispartof IEEE transactions on parallel and distributed systems, 2022-02, Vol.33 (2), p.263-275
issn 1045-9219
1558-2183
language eng
recordid cdi_crossref_primary_10_1109_TPDS_2021_3066909
source IEEE Electronic Library (IEL)
subjects Artificial neural networks
Circuits
Computer architecture
convolutional neural networks (CNN)
deep neural networks (DNN)
Dynamic random access memory
Efficiency
Logic circuits
look-up table (LUT)
Lookup tables
Machine learning
Network latency
Neural networks
Optimization
Parallel processing
Performance evaluation
Processing in memory (PIM)
Random access memory
Registers
Table lookup
Transmission gates
title Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T05%3A41%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Look-up-Table%20Based%20Processing-in-Memory%20Architecture%20With%20Programmable%20Precision-Scaling%20for%20Deep%20Learning%20Applications&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Sutradhar,%20Purab%20Ranjan&rft.date=2022-02-01&rft.volume=33&rft.issue=2&rft.spage=263&rft.epage=275&rft.pages=263-275&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2021.3066909&rft_dat=%3Cproquest_RIE%3E2565236215%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2565236215&rft_id=info:pmid/&rft_ieee_id=9380930&rfr_iscdi=true