Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications
Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Netwo...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2022-02, Vol.33 (2), p.263-275 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 275 |
---|---|
container_issue | 2 |
container_start_page | 263 |
container_title | IEEE transactions on parallel and distributed systems |
container_volume | 33 |
creator | Sutradhar, Purab Ranjan Bavikadi, Sathwika Connolly, Mark Prajapati, Savankumar Indovina, Mark A. Dinakarrao, Sai Manoj Pudukotai Ganguly, Amlan |
description | Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode. |
doi_str_mv | 10.1109/TPDS.2021.3066909 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPDS_2021_3066909</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9380930</ieee_id><sourcerecordid>2565236215</sourcerecordid><originalsourceid>FETCH-LOGICAL-c384t-438b7684f9f6738f165002ad2e465f4b20af796ff039d6ecda5d235ba9f6442c3</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhoMoWKs_QLwEPG_d72aPtX5CxUIrHsNmM9tuTbNxNwX7701s8TTDzPPOwJMk1wSPCMHqbjl_WIwopmTEsJQKq5NkQITIECUZO-16zAVSlKjz5CLGDcaEC8wHyc_M-y-0a9BSFxWk9zpCmc6DNxCjq1fI1egNtj7s00kwa9eCaXcB0k_XrntsFfR2-5ecBzAuOl-jhdFVF02tD-kDQJPOQIe6n0yapnJGtx0VL5Mzq6sIV8c6TD6eHpfTFzR7f36dTmbIsIy3iLOsGMuMW2XlmGWWSIEx1SUFLoXlBcXajpW0FjNVSjClFiVlotAdzzk1bJjcHu42wX_vILb5xu9C3b3MqZCCMkmJ6ChyoEzwMQaweRPcVod9TnDeC857wXkvOD8K7jI3h4wDgH9esazbYfYLmFJ4Fw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2565236215</pqid></control><display><type>article</type><title>Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications</title><source>IEEE Electronic Library (IEL)</source><creator>Sutradhar, Purab Ranjan ; Bavikadi, Sathwika ; Connolly, Mark ; Prajapati, Savankumar ; Indovina, Mark A. ; Dinakarrao, Sai Manoj Pudukotai ; Ganguly, Amlan</creator><creatorcontrib>Sutradhar, Purab Ranjan ; Bavikadi, Sathwika ; Connolly, Mark ; Prajapati, Savankumar ; Indovina, Mark A. ; Dinakarrao, Sai Manoj Pudukotai ; Ganguly, Amlan</creatorcontrib><description>Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2021.3066909</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Artificial neural networks ; Circuits ; Computer architecture ; convolutional neural networks (CNN) ; deep neural networks (DNN) ; Dynamic random access memory ; Efficiency ; Logic circuits ; look-up table (LUT) ; Lookup tables ; Machine learning ; Network latency ; Neural networks ; Optimization ; Parallel processing ; Performance evaluation ; Processing in memory (PIM) ; Random access memory ; Registers ; Table lookup ; Transmission gates</subject><ispartof>IEEE transactions on parallel and distributed systems, 2022-02, Vol.33 (2), p.263-275</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c384t-438b7684f9f6738f165002ad2e465f4b20af796ff039d6ecda5d235ba9f6442c3</citedby><cites>FETCH-LOGICAL-c384t-438b7684f9f6738f165002ad2e465f4b20af796ff039d6ecda5d235ba9f6442c3</cites><orcidid>0000-0002-1430-5070 ; 0000-0002-4417-2387 ; 0000-0001-7354-7873 ; 0000-0003-3155-1115</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9380930$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9380930$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sutradhar, Purab Ranjan</creatorcontrib><creatorcontrib>Bavikadi, Sathwika</creatorcontrib><creatorcontrib>Connolly, Mark</creatorcontrib><creatorcontrib>Prajapati, Savankumar</creatorcontrib><creatorcontrib>Indovina, Mark A.</creatorcontrib><creatorcontrib>Dinakarrao, Sai Manoj Pudukotai</creatorcontrib><creatorcontrib>Ganguly, Amlan</creatorcontrib><title>Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode.</description><subject>Artificial neural networks</subject><subject>Circuits</subject><subject>Computer architecture</subject><subject>convolutional neural networks (CNN)</subject><subject>deep neural networks (DNN)</subject><subject>Dynamic random access memory</subject><subject>Efficiency</subject><subject>Logic circuits</subject><subject>look-up table (LUT)</subject><subject>Lookup tables</subject><subject>Machine learning</subject><subject>Network latency</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Parallel processing</subject><subject>Performance evaluation</subject><subject>Processing in memory (PIM)</subject><subject>Random access memory</subject><subject>Registers</subject><subject>Table lookup</subject><subject>Transmission gates</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AQhoMoWKs_QLwEPG_d72aPtX5CxUIrHsNmM9tuTbNxNwX7701s8TTDzPPOwJMk1wSPCMHqbjl_WIwopmTEsJQKq5NkQITIECUZO-16zAVSlKjz5CLGDcaEC8wHyc_M-y-0a9BSFxWk9zpCmc6DNxCjq1fI1egNtj7s00kwa9eCaXcB0k_XrntsFfR2-5ecBzAuOl-jhdFVF02tD-kDQJPOQIe6n0yapnJGtx0VL5Mzq6sIV8c6TD6eHpfTFzR7f36dTmbIsIy3iLOsGMuMW2XlmGWWSIEx1SUFLoXlBcXajpW0FjNVSjClFiVlotAdzzk1bJjcHu42wX_vILb5xu9C3b3MqZCCMkmJ6ChyoEzwMQaweRPcVod9TnDeC857wXkvOD8K7jI3h4wDgH9esazbYfYLmFJ4Fw</recordid><startdate>20220201</startdate><enddate>20220201</enddate><creator>Sutradhar, Purab Ranjan</creator><creator>Bavikadi, Sathwika</creator><creator>Connolly, Mark</creator><creator>Prajapati, Savankumar</creator><creator>Indovina, Mark A.</creator><creator>Dinakarrao, Sai Manoj Pudukotai</creator><creator>Ganguly, Amlan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-1430-5070</orcidid><orcidid>https://orcid.org/0000-0002-4417-2387</orcidid><orcidid>https://orcid.org/0000-0001-7354-7873</orcidid><orcidid>https://orcid.org/0000-0003-3155-1115</orcidid></search><sort><creationdate>20220201</creationdate><title>Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications</title><author>Sutradhar, Purab Ranjan ; Bavikadi, Sathwika ; Connolly, Mark ; Prajapati, Savankumar ; Indovina, Mark A. ; Dinakarrao, Sai Manoj Pudukotai ; Ganguly, Amlan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c384t-438b7684f9f6738f165002ad2e465f4b20af796ff039d6ecda5d235ba9f6442c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial neural networks</topic><topic>Circuits</topic><topic>Computer architecture</topic><topic>convolutional neural networks (CNN)</topic><topic>deep neural networks (DNN)</topic><topic>Dynamic random access memory</topic><topic>Efficiency</topic><topic>Logic circuits</topic><topic>look-up table (LUT)</topic><topic>Lookup tables</topic><topic>Machine learning</topic><topic>Network latency</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Parallel processing</topic><topic>Performance evaluation</topic><topic>Processing in memory (PIM)</topic><topic>Random access memory</topic><topic>Registers</topic><topic>Table lookup</topic><topic>Transmission gates</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sutradhar, Purab Ranjan</creatorcontrib><creatorcontrib>Bavikadi, Sathwika</creatorcontrib><creatorcontrib>Connolly, Mark</creatorcontrib><creatorcontrib>Prajapati, Savankumar</creatorcontrib><creatorcontrib>Indovina, Mark A.</creatorcontrib><creatorcontrib>Dinakarrao, Sai Manoj Pudukotai</creatorcontrib><creatorcontrib>Ganguly, Amlan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sutradhar, Purab Ranjan</au><au>Bavikadi, Sathwika</au><au>Connolly, Mark</au><au>Prajapati, Savankumar</au><au>Indovina, Mark A.</au><au>Dinakarrao, Sai Manoj Pudukotai</au><au>Ganguly, Amlan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2022-02-01</date><risdate>2022</risdate><volume>33</volume><issue>2</issue><spage>263</spage><epage>275</epage><pages>263-275</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2021.3066909</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-1430-5070</orcidid><orcidid>https://orcid.org/0000-0002-4417-2387</orcidid><orcidid>https://orcid.org/0000-0001-7354-7873</orcidid><orcidid>https://orcid.org/0000-0003-3155-1115</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1045-9219 |
ispartof | IEEE transactions on parallel and distributed systems, 2022-02, Vol.33 (2), p.263-275 |
issn | 1045-9219 1558-2183 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TPDS_2021_3066909 |
source | IEEE Electronic Library (IEL) |
subjects | Artificial neural networks Circuits Computer architecture convolutional neural networks (CNN) deep neural networks (DNN) Dynamic random access memory Efficiency Logic circuits look-up table (LUT) Lookup tables Machine learning Network latency Neural networks Optimization Parallel processing Performance evaluation Processing in memory (PIM) Random access memory Registers Table lookup Transmission gates |
title | Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T05%3A41%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Look-up-Table%20Based%20Processing-in-Memory%20Architecture%20With%20Programmable%20Precision-Scaling%20for%20Deep%20Learning%20Applications&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Sutradhar,%20Purab%20Ranjan&rft.date=2022-02-01&rft.volume=33&rft.issue=2&rft.spage=263&rft.epage=275&rft.pages=263-275&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2021.3066909&rft_dat=%3Cproquest_RIE%3E2565236215%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2565236215&rft_id=info:pmid/&rft_ieee_id=9380930&rfr_iscdi=true |