Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares

We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRA...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on very large scale integration (VLSI) systems 2017-06, Vol.25 (6), p.1793-1806
Hauptverfasser: Lee, Jinho, Chung, Jongwook, Ahn, Jung Ho, Choi, Kiyoung
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1806
container_issue 6
container_start_page 1793
container_title IEEE transactions on very large scale integration (VLSI) systems
container_volume 25
creator Lee, Jinho
Chung, Jongwook
Ahn, Jung Ho
Choi, Kiyoung
description We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using `compare-n-op' operations, which are frequently used in various applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. We present two versions of buffered compare architecture-full-scale architecture and reduced architecture-in trade of performance and energy. The experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads.
doi_str_mv 10.1109/TVLSI.2017.2655722
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_7850975</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7850975</ieee_id><sourcerecordid>1902275879</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-1f3c515518705c5e42e0ba9cf9f4f81e20ee60bf386ce9e1d68dfb84d3ced0913</originalsourceid><addsrcrecordid>eNo9kEFPAjEQhRujiYj-Ab008bzYdre77RERhQSjQdRjs7RTKVl2sd01-u8tQpzLvEzem5l8CF1SMqCUyJvF2-xlOmCEFgOWc14wdoR6NIpExjqOmuRpIhglp-gshDUhNMsk6aH5-FuXX2Xr6g_crgBPnDFQ4-fSl1UFlQsbPK2DM4Dv5sNHPPR65VrQbech4HfXrvBtZy14MHjUbLZlHJ-jE1tWAS4OvY9e78eL0SSZPT1MR8NZopnkbUJtqnl8kYqCcM0hY0CWpdRW2swKCowA5GRpU5FrkEBNLoxdisykGgyRNO2j6_3erW8-OwitWjedr-NJRSVhrOCikNHF9i7tmxA8WLX1blP6H0WJ2rFTf-zUjp06sIuhq33IAcB_oBCcyIKnv9Ytaww</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1902275879</pqid></control><display><type>article</type><title>Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares</title><source>IEEE Electronic Library (IEL)</source><creator>Lee, Jinho ; Chung, Jongwook ; Ahn, Jung Ho ; Choi, Kiyoung</creator><creatorcontrib>Lee, Jinho ; Chung, Jongwook ; Ahn, Jung Ho ; Choi, Kiyoung</creatorcontrib><description>We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using `compare-n-op' operations, which are frequently used in various applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. We present two versions of buffered compare architecture-full-scale architecture and reduced architecture-in trade of performance and energy. The experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads.</description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2017.2655722</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accelerator architectures ; Bandwidth ; Bandwidths ; Buffers ; Commands ; Data management ; Data processing ; DRAM chips ; Dynamic random access memory ; memory architecture ; Memory devices ; Memory management ; Microprocessors ; Parallel processing ; Performance enhancement ; Random access memory ; Timing</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2017-06, Vol.25 (6), p.1793-1806</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-1f3c515518705c5e42e0ba9cf9f4f81e20ee60bf386ce9e1d68dfb84d3ced0913</citedby><cites>FETCH-LOGICAL-c295t-1f3c515518705c5e42e0ba9cf9f4f81e20ee60bf386ce9e1d68dfb84d3ced0913</cites><orcidid>0000-0001-6138-6697</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7850975$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7850975$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lee, Jinho</creatorcontrib><creatorcontrib>Chung, Jongwook</creatorcontrib><creatorcontrib>Ahn, Jung Ho</creatorcontrib><creatorcontrib>Choi, Kiyoung</creatorcontrib><title>Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description>We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using `compare-n-op' operations, which are frequently used in various applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. We present two versions of buffered compare architecture-full-scale architecture and reduced architecture-in trade of performance and energy. The experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads.</description><subject>Accelerator architectures</subject><subject>Bandwidth</subject><subject>Bandwidths</subject><subject>Buffers</subject><subject>Commands</subject><subject>Data management</subject><subject>Data processing</subject><subject>DRAM chips</subject><subject>Dynamic random access memory</subject><subject>memory architecture</subject><subject>Memory devices</subject><subject>Memory management</subject><subject>Microprocessors</subject><subject>Parallel processing</subject><subject>Performance enhancement</subject><subject>Random access memory</subject><subject>Timing</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEFPAjEQhRujiYj-Ab008bzYdre77RERhQSjQdRjs7RTKVl2sd01-u8tQpzLvEzem5l8CF1SMqCUyJvF2-xlOmCEFgOWc14wdoR6NIpExjqOmuRpIhglp-gshDUhNMsk6aH5-FuXX2Xr6g_crgBPnDFQ4-fSl1UFlQsbPK2DM4Dv5sNHPPR65VrQbech4HfXrvBtZy14MHjUbLZlHJ-jE1tWAS4OvY9e78eL0SSZPT1MR8NZopnkbUJtqnl8kYqCcM0hY0CWpdRW2swKCowA5GRpU5FrkEBNLoxdisykGgyRNO2j6_3erW8-OwitWjedr-NJRSVhrOCikNHF9i7tmxA8WLX1blP6H0WJ2rFTf-zUjp06sIuhq33IAcB_oBCcyIKnv9Ytaww</recordid><startdate>20170601</startdate><enddate>20170601</enddate><creator>Lee, Jinho</creator><creator>Chung, Jongwook</creator><creator>Ahn, Jung Ho</creator><creator>Choi, Kiyoung</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0001-6138-6697</orcidid></search><sort><creationdate>20170601</creationdate><title>Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares</title><author>Lee, Jinho ; Chung, Jongwook ; Ahn, Jung Ho ; Choi, Kiyoung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-1f3c515518705c5e42e0ba9cf9f4f81e20ee60bf386ce9e1d68dfb84d3ced0913</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Accelerator architectures</topic><topic>Bandwidth</topic><topic>Bandwidths</topic><topic>Buffers</topic><topic>Commands</topic><topic>Data management</topic><topic>Data processing</topic><topic>DRAM chips</topic><topic>Dynamic random access memory</topic><topic>memory architecture</topic><topic>Memory devices</topic><topic>Memory management</topic><topic>Microprocessors</topic><topic>Parallel processing</topic><topic>Performance enhancement</topic><topic>Random access memory</topic><topic>Timing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Jinho</creatorcontrib><creatorcontrib>Chung, Jongwook</creatorcontrib><creatorcontrib>Ahn, Jung Ho</creatorcontrib><creatorcontrib>Choi, Kiyoung</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Jinho</au><au>Chung, Jongwook</au><au>Ahn, Jung Ho</au><au>Choi, Kiyoung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2017-06-01</date><risdate>2017</risdate><volume>25</volume><issue>6</issue><spage>1793</spage><epage>1806</epage><pages>1793-1806</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract>We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using `compare-n-op' operations, which are frequently used in various applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. We present two versions of buffered compare architecture-full-scale architecture and reduced architecture-in trade of performance and energy. The experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TVLSI.2017.2655722</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-6138-6697</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1063-8210
ispartof IEEE transactions on very large scale integration (VLSI) systems, 2017-06, Vol.25 (6), p.1793-1806
issn 1063-8210
1557-9999
language eng
recordid cdi_ieee_primary_7850975
source IEEE Electronic Library (IEL)
subjects Accelerator architectures
Bandwidth
Bandwidths
Buffers
Commands
Data management
Data processing
DRAM chips
Dynamic random access memory
memory architecture
Memory devices
Memory management
Microprocessors
Parallel processing
Performance enhancement
Random access memory
Timing
title Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T04%3A44%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Excavating%20the%20Hidden%20Parallelism%20Inside%20DRAM%20Architectures%20With%20Buffered%20Compares&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Lee,%20Jinho&rft.date=2017-06-01&rft.volume=25&rft.issue=6&rft.spage=1793&rft.epage=1806&rft.pages=1793-1806&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2017.2655722&rft_dat=%3Cproquest_RIE%3E1902275879%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1902275879&rft_id=info:pmid/&rft_ieee_id=7850975&rfr_iscdi=true