Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares

We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRA...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on very large scale integration (VLSI) systems 2017-06, Vol.25 (6), p.1793-1806
Hauptverfasser:	Lee, Jinho, Chung, Jongwook, Ahn, Jung Ho, Choi, Kiyoung
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerator architectures Bandwidth Bandwidths Buffers Commands Data management Data processing DRAM chips Dynamic random access memory memory architecture Memory devices Memory management Microprocessors Parallel processing Performance enhancement Random access memory Timing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1806
container_issue	6
container_start_page	1793
container_title	IEEE transactions on very large scale integration (VLSI) systems
container_volume	25
creator	Lee, Jinho Chung, Jongwook Ahn, Jung Ho Choi, Kiyoung
description	We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using `compare-n-op' operations, which are frequently used in various applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. We present two versions of buffered compare architecture-full-scale architecture and reduced architecture-in trade of performance and energy. The experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads.
doi_str_mv	10.1109/TVLSI.2017.2655722
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_7850975</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7850975</ieee_id><sourcerecordid>1902275879</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-1f3c515518705c5e42e0ba9cf9f4f81e20ee60bf386ce9e1d68dfb84d3ced0913</originalsourceid><addsrcrecordid>eNo9kEFPAjEQhRujiYj-Ab008bzYdre77RERhQSjQdRjs7RTKVl2sd01-u8tQpzLvEzem5l8CF1SMqCUyJvF2-xlOmCEFgOWc14wdoR6NIpExjqOmuRpIhglp-gshDUhNMsk6aH5-FuXX2Xr6g_crgBPnDFQ4-fSl1UFlQsbPK2DM4Dv5sNHPPR65VrQbech4HfXrvBtZy14MHjUbLZlHJ-jE1tWAS4OvY9e78eL0SSZPT1MR8NZopnkbUJtqnl8kYqCcM0hY0CWpdRW2swKCowA5GRpU5FrkEBNLoxdisykGgyRNO2j6_3erW8-OwitWjedr-NJRSVhrOCikNHF9i7tmxA8WLX1blP6H0WJ2rFTf-zUjp06sIuhq33IAcB_oBCcyIKnv9Ytaww</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1902275879</pqid></control><display><type>article</type><title>Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares</title><source>IEEE Electronic Library (IEL)</source><creator>Lee, Jinho ; Chung, Jongwook ; Ahn, Jung Ho ; Choi, Kiyoung</creator><creatorcontrib>Lee, Jinho ; Chung, Jongwook ; Ahn, Jung Ho ; Choi, Kiyoung</creatorcontrib><description>We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using `compare-n-op' operations, which are frequently used in various applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. We present two versions of buffered compare architecture-full-scale architecture and reduced architecture-in trade of performance and energy. The experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads.</description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2017.2655722</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accelerator architectures ; Bandwidth ; Bandwidths ; Buffers ; Commands ; Data management ; Data processing ; DRAM chips ; Dynamic random access memory ; memory architecture ; Memory devices ; Memory management ; Microprocessors ; Parallel processing ; Performance enhancement ; Random access memory ; Timing</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2017-06, Vol.25 (6), p.1793-1806</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-1f3c515518705c5e42e0ba9cf9f4f81e20ee60bf386ce9e1d68dfb84d3ced0913</citedby><cites>FETCH-LOGICAL-c295t-1f3c515518705c5e42e0ba9cf9f4f81e20ee60bf386ce9e1d68dfb84d3ced0913</cites><orcidid>0000-0001-6138-6697</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7850975$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7850975$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lee, Jinho</creatorcontrib><creatorcontrib>Chung, Jongwook</creatorcontrib><creatorcontrib>Ahn, Jung Ho</creatorcontrib><creatorcontrib>Choi, Kiyoung</creatorcontrib><title>Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description>We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using `compare-n-op' operations, which are frequently used in various applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. We present two versions of buffered compare architecture-full-scale architecture and reduced architecture-in trade of performance and energy. The experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads.</description><subject>Accelerator architectures</subject><subject>Bandwidth</subject><subject>Bandwidths</subject><subject>Buffers</subject><subject>Commands</subject><subject>Data management</subject><subject>Data processing</subject><subject>DRAM chips</subject><subject>Dynamic random access memory</subject><subject>memory architecture</subject><subject>Memory devices</subject><subject>Memory management</subject><subject>Microprocessors</subject><subject>Parallel processing</subject><subject>Performance enhancement</subject><subject>Random access memory</subject><subject>Timing</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEFPAjEQhRujiYj-Ab008bzYdre77RERhQSjQdRjs7RTKVl2sd01-u8tQpzLvEzem5l8CF1SMqCUyJvF2-xlOmCEFgOWc14wdoR6NIpExjqOmuRpIhglp-gshDUhNMsk6aH5-FuXX2Xr6g_crgBPnDFQ4-fSl1UFlQsbPK2DM4Dv5sNHPPR65VrQbech4HfXrvBtZy14MHjUbLZlHJ-jE1tWAS4OvY9e78eL0SSZPT1MR8NZopnkbUJtqnl8kYqCcM0hY0CWpdRW2swKCowA5GRpU5FrkEBNLoxdisykGgyRNO2j6_3erW8-OwitWjedr-NJRSVhrOCikNHF9i7tmxA8WLX1blP6H0WJ2rFTf-zUjp06sIuhq33IAcB_oBCcyIKnv9Ytaww</recordid><startdate>20170601</startdate><enddate>20170601</enddate><creator>Lee, Jinho</creator><creator>Chung, Jongwook</creator><creator>Ahn, Jung Ho</creator><creator>Choi, Kiyoung</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0001-6138-6697</orcidid></search><sort><creationdate>20170601</creationdate><title>Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares</title><author>Lee, Jinho ; Chung, Jongwook ; Ahn, Jung Ho ; Choi, Kiyoung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-1f3c515518705c5e42e0ba9cf9f4f81e20ee60bf386ce9e1d68dfb84d3ced0913</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Accelerator architectures</topic><topic>Bandwidth</topic><topic>Bandwidths</topic><topic>Buffers</topic><topic>Commands</topic><topic>Data management</topic><topic>Data processing</topic><topic>DRAM chips</topic><topic>Dynamic random access memory</topic><topic>memory architecture</topic><topic>Memory devices</topic><topic>Memory management</topic><topic>Microprocessors</topic><topic>Parallel processing</topic><topic>Performance enhancement</topic><topic>Random access memory</topic><topic>Timing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Jinho</creatorcontrib><creatorcontrib>Chung, Jongwook</creatorcontrib><creatorcontrib>Ahn, Jung Ho</creatorcontrib><creatorcontrib>Choi, Kiyoung</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Jinho</au><au>Chung, Jongwook</au><au>Ahn, Jung Ho</au><au>Choi, Kiyoung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2017-06-01</date><risdate>2017</risdate><volume>25</volume><issue>6</issue><spage>1793</spage><epage>1806</epage><pages>1793-1806</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract>We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multibank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using `compare-n-op' operations, which are frequently used in various applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. We present two versions of buffered compare architecture-full-scale architecture and reduced architecture-in trade of performance and energy. The experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TVLSI.2017.2655722</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-6138-6697</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1063-8210
ispartof	IEEE transactions on very large scale integration (VLSI) systems, 2017-06, Vol.25 (6), p.1793-1806
issn	1063-8210 1557-9999
language	eng
recordid	cdi_ieee_primary_7850975
source	IEEE Electronic Library (IEL)
subjects	Accelerator architectures Bandwidth Bandwidths Buffers Commands Data management Data processing DRAM chips Dynamic random access memory memory architecture Memory devices Memory management Microprocessors Parallel processing Performance enhancement Random access memory Timing
title	Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T04%3A44%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Excavating%20the%20Hidden%20Parallelism%20Inside%20DRAM%20Architectures%20With%20Buffered%20Compares&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Lee,%20Jinho&rft.date=2017-06-01&rft.volume=25&rft.issue=6&rft.spage=1793&rft.epage=1806&rft.pages=1793-1806&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2017.2655722&rft_dat=%3Cproquest_RIE%3E1902275879%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1902275879&rft_id=info:pmid/&rft_ieee_id=7850975&rfr_iscdi=true