Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications

Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Netwo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2022-02, Vol.33 (2), p.263-275
Hauptverfasser:	Sutradhar, Purab Ranjan, Bavikadi, Sathwika, Connolly, Mark, Prajapati, Savankumar, Indovina, Mark A., Dinakarrao, Sai Manoj Pudukotai, Ganguly, Amlan
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Circuits Computer architecture convolutional neural networks (CNN) deep neural networks (DNN) Dynamic random access memory Efficiency Logic circuits look-up table (LUT) Lookup tables Machine learning Network latency Neural networks Optimization Parallel processing Performance evaluation Processing in memory (PIM) Random access memory Registers Table lookup Transmission gates
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	275
container_issue	2
container_start_page	263
container_title	IEEE transactions on parallel and distributed systems
container_volume	33
creator	Sutradhar, Purab Ranjan Bavikadi, Sathwika Connolly, Mark Prajapati, Savankumar Indovina, Mark A. Dinakarrao, Sai Manoj Pudukotai Ganguly, Amlan
description	Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode.
doi_str_mv	10.1109/TPDS.2021.3066909
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPDS_2021_3066909</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9380930</ieee_id><sourcerecordid>2565236215</sourcerecordid><originalsourceid>FETCH-LOGICAL-c384t-438b7684f9f6738f165002ad2e465f4b20af796ff039d6ecda5d235ba9f6442c3</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhoMoWKs_QLwEPG_d72aPtX5CxUIrHsNmM9tuTbNxNwX7701s8TTDzPPOwJMk1wSPCMHqbjl_WIwopmTEsJQKq5NkQITIECUZO-16zAVSlKjz5CLGDcaEC8wHyc_M-y-0a9BSFxWk9zpCmc6DNxCjq1fI1egNtj7s00kwa9eCaXcB0k_XrntsFfR2-5ecBzAuOl-jhdFVF02tD-kDQJPOQIe6n0yapnJGtx0VL5Mzq6sIV8c6TD6eHpfTFzR7f36dTmbIsIy3iLOsGMuMW2XlmGWWSIEx1SUFLoXlBcXajpW0FjNVSjClFiVlotAdzzk1bJjcHu42wX_vILb5xu9C3b3MqZCCMkmJ6ChyoEzwMQaweRPcVod9TnDeC857wXkvOD8K7jI3h4wDgH9esazbYfYLmFJ4Fw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2565236215</pqid></control><display><type>article</type><title>Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications</title><source>IEEE Electronic Library (IEL)</source><creator>Sutradhar, Purab Ranjan ; Bavikadi, Sathwika ; Connolly, Mark ; Prajapati, Savankumar ; Indovina, Mark A. ; Dinakarrao, Sai Manoj Pudukotai ; Ganguly, Amlan</creator><creatorcontrib>Sutradhar, Purab Ranjan ; Bavikadi, Sathwika ; Connolly, Mark ; Prajapati, Savankumar ; Indovina, Mark A. ; Dinakarrao, Sai Manoj Pudukotai ; Ganguly, Amlan</creatorcontrib><description>Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2021.3066909</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Artificial neural networks ; Circuits ; Computer architecture ; convolutional neural networks (CNN) ; deep neural networks (DNN) ; Dynamic random access memory ; Efficiency ; Logic circuits ; look-up table (LUT) ; Lookup tables ; Machine learning ; Network latency ; Neural networks ; Optimization ; Parallel processing ; Performance evaluation ; Processing in memory (PIM) ; Random access memory ; Registers ; Table lookup ; Transmission gates</subject><ispartof>IEEE transactions on parallel and distributed systems, 2022-02, Vol.33 (2), p.263-275</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c384t-438b7684f9f6738f165002ad2e465f4b20af796ff039d6ecda5d235ba9f6442c3</citedby><cites>FETCH-LOGICAL-c384t-438b7684f9f6738f165002ad2e465f4b20af796ff039d6ecda5d235ba9f6442c3</cites><orcidid>0000-0002-1430-5070 ; 0000-0002-4417-2387 ; 0000-0001-7354-7873 ; 0000-0003-3155-1115</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9380930$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9380930$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sutradhar, Purab Ranjan</creatorcontrib><creatorcontrib>Bavikadi, Sathwika</creatorcontrib><creatorcontrib>Connolly, Mark</creatorcontrib><creatorcontrib>Prajapati, Savankumar</creatorcontrib><creatorcontrib>Indovina, Mark A.</creatorcontrib><creatorcontrib>Dinakarrao, Sai Manoj Pudukotai</creatorcontrib><creatorcontrib>Ganguly, Amlan</creatorcontrib><title>Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode.</description><subject>Artificial neural networks</subject><subject>Circuits</subject><subject>Computer architecture</subject><subject>convolutional neural networks (CNN)</subject><subject>deep neural networks (DNN)</subject><subject>Dynamic random access memory</subject><subject>Efficiency</subject><subject>Logic circuits</subject><subject>look-up table (LUT)</subject><subject>Lookup tables</subject><subject>Machine learning</subject><subject>Network latency</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Parallel processing</subject><subject>Performance evaluation</subject><subject>Processing in memory (PIM)</subject><subject>Random access memory</subject><subject>Registers</subject><subject>Table lookup</subject><subject>Transmission gates</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AQhoMoWKs_QLwEPG_d72aPtX5CxUIrHsNmM9tuTbNxNwX7701s8TTDzPPOwJMk1wSPCMHqbjl_WIwopmTEsJQKq5NkQITIECUZO-16zAVSlKjz5CLGDcaEC8wHyc_M-y-0a9BSFxWk9zpCmc6DNxCjq1fI1egNtj7s00kwa9eCaXcB0k_XrntsFfR2-5ecBzAuOl-jhdFVF02tD-kDQJPOQIe6n0yapnJGtx0VL5Mzq6sIV8c6TD6eHpfTFzR7f36dTmbIsIy3iLOsGMuMW2XlmGWWSIEx1SUFLoXlBcXajpW0FjNVSjClFiVlotAdzzk1bJjcHu42wX_vILb5xu9C3b3MqZCCMkmJ6ChyoEzwMQaweRPcVod9TnDeC857wXkvOD8K7jI3h4wDgH9esazbYfYLmFJ4Fw</recordid><startdate>20220201</startdate><enddate>20220201</enddate><creator>Sutradhar, Purab Ranjan</creator><creator>Bavikadi, Sathwika</creator><creator>Connolly, Mark</creator><creator>Prajapati, Savankumar</creator><creator>Indovina, Mark A.</creator><creator>Dinakarrao, Sai Manoj Pudukotai</creator><creator>Ganguly, Amlan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-1430-5070</orcidid><orcidid>https://orcid.org/0000-0002-4417-2387</orcidid><orcidid>https://orcid.org/0000-0001-7354-7873</orcidid><orcidid>https://orcid.org/0000-0003-3155-1115</orcidid></search><sort><creationdate>20220201</creationdate><title>Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications</title><author>Sutradhar, Purab Ranjan ; Bavikadi, Sathwika ; Connolly, Mark ; Prajapati, Savankumar ; Indovina, Mark A. ; Dinakarrao, Sai Manoj Pudukotai ; Ganguly, Amlan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c384t-438b7684f9f6738f165002ad2e465f4b20af796ff039d6ecda5d235ba9f6442c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial neural networks</topic><topic>Circuits</topic><topic>Computer architecture</topic><topic>convolutional neural networks (CNN)</topic><topic>deep neural networks (DNN)</topic><topic>Dynamic random access memory</topic><topic>Efficiency</topic><topic>Logic circuits</topic><topic>look-up table (LUT)</topic><topic>Lookup tables</topic><topic>Machine learning</topic><topic>Network latency</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Parallel processing</topic><topic>Performance evaluation</topic><topic>Processing in memory (PIM)</topic><topic>Random access memory</topic><topic>Registers</topic><topic>Table lookup</topic><topic>Transmission gates</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sutradhar, Purab Ranjan</creatorcontrib><creatorcontrib>Bavikadi, Sathwika</creatorcontrib><creatorcontrib>Connolly, Mark</creatorcontrib><creatorcontrib>Prajapati, Savankumar</creatorcontrib><creatorcontrib>Indovina, Mark A.</creatorcontrib><creatorcontrib>Dinakarrao, Sai Manoj Pudukotai</creatorcontrib><creatorcontrib>Ganguly, Amlan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sutradhar, Purab Ranjan</au><au>Bavikadi, Sathwika</au><au>Connolly, Mark</au><au>Prajapati, Savankumar</au><au>Indovina, Mark A.</au><au>Dinakarrao, Sai Manoj Pudukotai</au><au>Ganguly, Amlan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2022-02-01</date><risdate>2022</risdate><volume>33</volume><issue>2</issue><spage>263</spage><epage>275</epage><pages>263-275</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2021.3066909</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-1430-5070</orcidid><orcidid>https://orcid.org/0000-0002-4417-2387</orcidid><orcidid>https://orcid.org/0000-0001-7354-7873</orcidid><orcidid>https://orcid.org/0000-0003-3155-1115</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1045-9219
ispartof	IEEE transactions on parallel and distributed systems, 2022-02, Vol.33 (2), p.263-275
issn	1045-9219 1558-2183
language	eng
recordid	cdi_crossref_primary_10_1109_TPDS_2021_3066909
source	IEEE Electronic Library (IEL)
subjects	Artificial neural networks Circuits Computer architecture convolutional neural networks (CNN) deep neural networks (DNN) Dynamic random access memory Efficiency Logic circuits look-up table (LUT) Lookup tables Machine learning Network latency Neural networks Optimization Parallel processing Performance evaluation Processing in memory (PIM) Random access memory Registers Table lookup Transmission gates
title	Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T05%3A41%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Look-up-Table%20Based%20Processing-in-Memory%20Architecture%20With%20Programmable%20Precision-Scaling%20for%20Deep%20Learning%20Applications&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Sutradhar,%20Purab%20Ranjan&rft.date=2022-02-01&rft.volume=33&rft.issue=2&rft.spage=263&rft.epage=275&rft.pages=263-275&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2021.3066909&rft_dat=%3Cproquest_RIE%3E2565236215%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2565236215&rft_id=info:pmid/&rft_ieee_id=9380930&rfr_iscdi=true