Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares
We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRA...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on very large scale integration (VLSI) systems 2017-06, Vol.25 (6), p.1793-1806 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1806 |
---|---|
container_issue | 6 |
container_start_page | 1793 |
container_title | IEEE transactions on very large scale integration (VLSI) systems |
container_volume | 25 |
creator | Lee, Jinho Chung, Jongwook Ahn, Jung Ho Choi, Kiyoung |
description | We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using `compare-n-op' operations, which are frequently used in various applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. We present two versions of buffered compare architecture-full-scale architecture and reduced architecture-in trade of performance and energy. The experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads. |
doi_str_mv | 10.1109/TVLSI.2017.2655722 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_7850975</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7850975</ieee_id><sourcerecordid>1902275879</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-1f3c515518705c5e42e0ba9cf9f4f81e20ee60bf386ce9e1d68dfb84d3ced0913</originalsourceid><addsrcrecordid>eNo9kEFPAjEQhRujiYj-Ab008bzYdre77RERhQSjQdRjs7RTKVl2sd01-u8tQpzLvEzem5l8CF1SMqCUyJvF2-xlOmCEFgOWc14wdoR6NIpExjqOmuRpIhglp-gshDUhNMsk6aH5-FuXX2Xr6g_crgBPnDFQ4-fSl1UFlQsbPK2DM4Dv5sNHPPR65VrQbech4HfXrvBtZy14MHjUbLZlHJ-jE1tWAS4OvY9e78eL0SSZPT1MR8NZopnkbUJtqnl8kYqCcM0hY0CWpdRW2swKCowA5GRpU5FrkEBNLoxdisykGgyRNO2j6_3erW8-OwitWjedr-NJRSVhrOCikNHF9i7tmxA8WLX1blP6H0WJ2rFTf-zUjp06sIuhq33IAcB_oBCcyIKnv9Ytaww</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1902275879</pqid></control><display><type>article</type><title>Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares</title><source>IEEE Electronic Library (IEL)</source><creator>Lee, Jinho ; Chung, Jongwook ; Ahn, Jung Ho ; Choi, Kiyoung</creator><creatorcontrib>Lee, Jinho ; Chung, Jongwook ; Ahn, Jung Ho ; Choi, Kiyoung</creatorcontrib><description>We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using `compare-n-op' operations, which are frequently used in various applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. We present two versions of buffered compare architecture-full-scale architecture and reduced architecture-in trade of performance and energy. The experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads.</description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2017.2655722</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accelerator architectures ; Bandwidth ; Bandwidths ; Buffers ; Commands ; Data management ; Data processing ; DRAM chips ; Dynamic random access memory ; memory architecture ; Memory devices ; Memory management ; Microprocessors ; Parallel processing ; Performance enhancement ; Random access memory ; Timing</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2017-06, Vol.25 (6), p.1793-1806</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-1f3c515518705c5e42e0ba9cf9f4f81e20ee60bf386ce9e1d68dfb84d3ced0913</citedby><cites>FETCH-LOGICAL-c295t-1f3c515518705c5e42e0ba9cf9f4f81e20ee60bf386ce9e1d68dfb84d3ced0913</cites><orcidid>0000-0001-6138-6697</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7850975$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7850975$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lee, Jinho</creatorcontrib><creatorcontrib>Chung, Jongwook</creatorcontrib><creatorcontrib>Ahn, Jung Ho</creatorcontrib><creatorcontrib>Choi, Kiyoung</creatorcontrib><title>Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description>We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using `compare-n-op' operations, which are frequently used in various applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. We present two versions of buffered compare architecture-full-scale architecture and reduced architecture-in trade of performance and energy. The experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads.</description><subject>Accelerator architectures</subject><subject>Bandwidth</subject><subject>Bandwidths</subject><subject>Buffers</subject><subject>Commands</subject><subject>Data management</subject><subject>Data processing</subject><subject>DRAM chips</subject><subject>Dynamic random access memory</subject><subject>memory architecture</subject><subject>Memory devices</subject><subject>Memory management</subject><subject>Microprocessors</subject><subject>Parallel processing</subject><subject>Performance enhancement</subject><subject>Random access memory</subject><subject>Timing</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEFPAjEQhRujiYj-Ab008bzYdre77RERhQSjQdRjs7RTKVl2sd01-u8tQpzLvEzem5l8CF1SMqCUyJvF2-xlOmCEFgOWc14wdoR6NIpExjqOmuRpIhglp-gshDUhNMsk6aH5-FuXX2Xr6g_crgBPnDFQ4-fSl1UFlQsbPK2DM4Dv5sNHPPR65VrQbech4HfXrvBtZy14MHjUbLZlHJ-jE1tWAS4OvY9e78eL0SSZPT1MR8NZopnkbUJtqnl8kYqCcM0hY0CWpdRW2swKCowA5GRpU5FrkEBNLoxdisykGgyRNO2j6_3erW8-OwitWjedr-NJRSVhrOCikNHF9i7tmxA8WLX1blP6H0WJ2rFTf-zUjp06sIuhq33IAcB_oBCcyIKnv9Ytaww</recordid><startdate>20170601</startdate><enddate>20170601</enddate><creator>Lee, Jinho</creator><creator>Chung, Jongwook</creator><creator>Ahn, Jung Ho</creator><creator>Choi, Kiyoung</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0001-6138-6697</orcidid></search><sort><creationdate>20170601</creationdate><title>Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares</title><author>Lee, Jinho ; Chung, Jongwook ; Ahn, Jung Ho ; Choi, Kiyoung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-1f3c515518705c5e42e0ba9cf9f4f81e20ee60bf386ce9e1d68dfb84d3ced0913</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Accelerator architectures</topic><topic>Bandwidth</topic><topic>Bandwidths</topic><topic>Buffers</topic><topic>Commands</topic><topic>Data management</topic><topic>Data processing</topic><topic>DRAM chips</topic><topic>Dynamic random access memory</topic><topic>memory architecture</topic><topic>Memory devices</topic><topic>Memory management</topic><topic>Microprocessors</topic><topic>Parallel processing</topic><topic>Performance enhancement</topic><topic>Random access memory</topic><topic>Timing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Jinho</creatorcontrib><creatorcontrib>Chung, Jongwook</creatorcontrib><creatorcontrib>Ahn, Jung Ho</creatorcontrib><creatorcontrib>Choi, Kiyoung</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Jinho</au><au>Chung, Jongwook</au><au>Ahn, Jung Ho</au><au>Choi, Kiyoung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2017-06-01</date><risdate>2017</risdate><volume>25</volume><issue>6</issue><spage>1793</spage><epage>1806</epage><pages>1793-1806</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract>We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using `compare-n-op' operations, which are frequently used in various applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. We present two versions of buffered compare architecture-full-scale architecture and reduced architecture-in trade of performance and energy. The experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TVLSI.2017.2655722</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-6138-6697</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1063-8210 |
ispartof | IEEE transactions on very large scale integration (VLSI) systems, 2017-06, Vol.25 (6), p.1793-1806 |
issn | 1063-8210 1557-9999 |
language | eng |
recordid | cdi_ieee_primary_7850975 |
source | IEEE Electronic Library (IEL) |
subjects | Accelerator architectures Bandwidth Bandwidths Buffers Commands Data management Data processing DRAM chips Dynamic random access memory memory architecture Memory devices Memory management Microprocessors Parallel processing Performance enhancement Random access memory Timing |
title | Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T04%3A44%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Excavating%20the%20Hidden%20Parallelism%20Inside%20DRAM%20Architectures%20With%20Buffered%20Compares&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Lee,%20Jinho&rft.date=2017-06-01&rft.volume=25&rft.issue=6&rft.spage=1793&rft.epage=1806&rft.pages=1793-1806&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2017.2655722&rft_dat=%3Cproquest_RIE%3E1902275879%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1902275879&rft_id=info:pmid/&rft_ieee_id=7850975&rfr_iscdi=true |