RowClone: Accelerating Data Movement and Initialization Using DRAM

In existing systems, to perform any bulk data movement operation (copy or initialization), the data has to first be read into the on-chip processor, all the way into the L1 cache, and the result of the operation must be written back to main memory. This is despite the fact that these operations do n...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Seshadri, Vivek, Kim, Yoongu, Fallin, Chris, Lee, Donghyuk, Ausavarungnirun, Rachata, Pekhimenko, Gennady, Luo, Yixin, Mutlu, Onur, Gibbons, Phillip B, Kozuch, Michael A, Mowry, Todd C
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Seshadri, Vivek
Kim, Yoongu
Fallin, Chris
Lee, Donghyuk
Ausavarungnirun, Rachata
Pekhimenko, Gennady
Luo, Yixin
Mutlu, Onur
Gibbons, Phillip B
Kozuch, Michael A
Mowry, Todd C
description In existing systems, to perform any bulk data movement operation (copy or initialization), the data has to first be read into the on-chip processor, all the way into the L1 cache, and the result of the operation must be written back to main memory. This is despite the fact that these operations do not involve any actual computation. RowClone exploits the organization and operation of commodity DRAM to perform these operations completely inside DRAM using two mechanisms. The first mechanism, Fast Parallel Mode, copies data between two rows inside the same DRAM subarray by issuing back-to-back activate commands to the source and the destination row. The second mechanism, Pipelined Serial Mode, transfers cache lines between two banks using the shared internal bus. RowClone significantly reduces the raw latency and energy consumption of bulk data copy and initialization. This reduction directly translates to improvement in performance and energy efficiency of systems running copy or initialization-intensive workloads
doi_str_mv 10.48550/arxiv.1805.03502
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1805_03502</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1805_03502</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-ff67f2c15389a033ed21077e53b031a04a342ba305e9fdb6d34b0cbd99b53b0f3</originalsourceid><addsrcrecordid>eNotz8tOwzAQhWFvWKDCA7DCL5Aw9sS5sAvhVqkVUtWuo3E8riylDkqjcnl6aGB1Np-O9AtxoyDNSmPgjsbPcEpVCSYFNKAvxcNm-Gj6IfK9rLuOex5pCnEvH2kiuR5OfOA4SYpOLmOYAvXh-xcMUe6OM9vU6ytx4ak_8vX_LsT2-WnbvCart5dlU68SygudeJ8XXnfKYFkRILLTCoqCDVpARZARZtoSguHKO5s7zCx01lWVPROPC3H7dztHtO9jOND41Z5j2jkGfwAu1EO6</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>RowClone: Accelerating Data Movement and Initialization Using DRAM</title><source>arXiv.org</source><creator>Seshadri, Vivek ; Kim, Yoongu ; Fallin, Chris ; Lee, Donghyuk ; Ausavarungnirun, Rachata ; Pekhimenko, Gennady ; Luo, Yixin ; Mutlu, Onur ; Gibbons, Phillip B ; Kozuch, Michael A ; Mowry, Todd C</creator><creatorcontrib>Seshadri, Vivek ; Kim, Yoongu ; Fallin, Chris ; Lee, Donghyuk ; Ausavarungnirun, Rachata ; Pekhimenko, Gennady ; Luo, Yixin ; Mutlu, Onur ; Gibbons, Phillip B ; Kozuch, Michael A ; Mowry, Todd C</creatorcontrib><description>In existing systems, to perform any bulk data movement operation (copy or initialization), the data has to first be read into the on-chip processor, all the way into the L1 cache, and the result of the operation must be written back to main memory. This is despite the fact that these operations do not involve any actual computation. RowClone exploits the organization and operation of commodity DRAM to perform these operations completely inside DRAM using two mechanisms. The first mechanism, Fast Parallel Mode, copies data between two rows inside the same DRAM subarray by issuing back-to-back activate commands to the source and the destination row. The second mechanism, Pipelined Serial Mode, transfers cache lines between two banks using the shared internal bus. RowClone significantly reduces the raw latency and energy consumption of bulk data copy and initialization. This reduction directly translates to improvement in performance and energy efficiency of systems running copy or initialization-intensive workloads</description><identifier>DOI: 10.48550/arxiv.1805.03502</identifier><language>eng</language><subject>Computer Science - Hardware Architecture</subject><creationdate>2018-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1805.03502$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1805.03502$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Seshadri, Vivek</creatorcontrib><creatorcontrib>Kim, Yoongu</creatorcontrib><creatorcontrib>Fallin, Chris</creatorcontrib><creatorcontrib>Lee, Donghyuk</creatorcontrib><creatorcontrib>Ausavarungnirun, Rachata</creatorcontrib><creatorcontrib>Pekhimenko, Gennady</creatorcontrib><creatorcontrib>Luo, Yixin</creatorcontrib><creatorcontrib>Mutlu, Onur</creatorcontrib><creatorcontrib>Gibbons, Phillip B</creatorcontrib><creatorcontrib>Kozuch, Michael A</creatorcontrib><creatorcontrib>Mowry, Todd C</creatorcontrib><title>RowClone: Accelerating Data Movement and Initialization Using DRAM</title><description>In existing systems, to perform any bulk data movement operation (copy or initialization), the data has to first be read into the on-chip processor, all the way into the L1 cache, and the result of the operation must be written back to main memory. This is despite the fact that these operations do not involve any actual computation. RowClone exploits the organization and operation of commodity DRAM to perform these operations completely inside DRAM using two mechanisms. The first mechanism, Fast Parallel Mode, copies data between two rows inside the same DRAM subarray by issuing back-to-back activate commands to the source and the destination row. The second mechanism, Pipelined Serial Mode, transfers cache lines between two banks using the shared internal bus. RowClone significantly reduces the raw latency and energy consumption of bulk data copy and initialization. This reduction directly translates to improvement in performance and energy efficiency of systems running copy or initialization-intensive workloads</description><subject>Computer Science - Hardware Architecture</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8tOwzAQhWFvWKDCA7DCL5Aw9sS5sAvhVqkVUtWuo3E8riylDkqjcnl6aGB1Np-O9AtxoyDNSmPgjsbPcEpVCSYFNKAvxcNm-Gj6IfK9rLuOex5pCnEvH2kiuR5OfOA4SYpOLmOYAvXh-xcMUe6OM9vU6ytx4ak_8vX_LsT2-WnbvCart5dlU68SygudeJ8XXnfKYFkRILLTCoqCDVpARZARZtoSguHKO5s7zCx01lWVPROPC3H7dztHtO9jOND41Z5j2jkGfwAu1EO6</recordid><startdate>20180507</startdate><enddate>20180507</enddate><creator>Seshadri, Vivek</creator><creator>Kim, Yoongu</creator><creator>Fallin, Chris</creator><creator>Lee, Donghyuk</creator><creator>Ausavarungnirun, Rachata</creator><creator>Pekhimenko, Gennady</creator><creator>Luo, Yixin</creator><creator>Mutlu, Onur</creator><creator>Gibbons, Phillip B</creator><creator>Kozuch, Michael A</creator><creator>Mowry, Todd C</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20180507</creationdate><title>RowClone: Accelerating Data Movement and Initialization Using DRAM</title><author>Seshadri, Vivek ; Kim, Yoongu ; Fallin, Chris ; Lee, Donghyuk ; Ausavarungnirun, Rachata ; Pekhimenko, Gennady ; Luo, Yixin ; Mutlu, Onur ; Gibbons, Phillip B ; Kozuch, Michael A ; Mowry, Todd C</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-ff67f2c15389a033ed21077e53b031a04a342ba305e9fdb6d34b0cbd99b53b0f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer Science - Hardware Architecture</topic><toplevel>online_resources</toplevel><creatorcontrib>Seshadri, Vivek</creatorcontrib><creatorcontrib>Kim, Yoongu</creatorcontrib><creatorcontrib>Fallin, Chris</creatorcontrib><creatorcontrib>Lee, Donghyuk</creatorcontrib><creatorcontrib>Ausavarungnirun, Rachata</creatorcontrib><creatorcontrib>Pekhimenko, Gennady</creatorcontrib><creatorcontrib>Luo, Yixin</creatorcontrib><creatorcontrib>Mutlu, Onur</creatorcontrib><creatorcontrib>Gibbons, Phillip B</creatorcontrib><creatorcontrib>Kozuch, Michael A</creatorcontrib><creatorcontrib>Mowry, Todd C</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Seshadri, Vivek</au><au>Kim, Yoongu</au><au>Fallin, Chris</au><au>Lee, Donghyuk</au><au>Ausavarungnirun, Rachata</au><au>Pekhimenko, Gennady</au><au>Luo, Yixin</au><au>Mutlu, Onur</au><au>Gibbons, Phillip B</au><au>Kozuch, Michael A</au><au>Mowry, Todd C</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RowClone: Accelerating Data Movement and Initialization Using DRAM</atitle><date>2018-05-07</date><risdate>2018</risdate><abstract>In existing systems, to perform any bulk data movement operation (copy or initialization), the data has to first be read into the on-chip processor, all the way into the L1 cache, and the result of the operation must be written back to main memory. This is despite the fact that these operations do not involve any actual computation. RowClone exploits the organization and operation of commodity DRAM to perform these operations completely inside DRAM using two mechanisms. The first mechanism, Fast Parallel Mode, copies data between two rows inside the same DRAM subarray by issuing back-to-back activate commands to the source and the destination row. The second mechanism, Pipelined Serial Mode, transfers cache lines between two banks using the shared internal bus. RowClone significantly reduces the raw latency and energy consumption of bulk data copy and initialization. This reduction directly translates to improvement in performance and energy efficiency of systems running copy or initialization-intensive workloads</abstract><doi>10.48550/arxiv.1805.03502</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1805.03502
ispartof
issn
language eng
recordid cdi_arxiv_primary_1805_03502
source arXiv.org
subjects Computer Science - Hardware Architecture
title RowClone: Accelerating Data Movement and Initialization Using DRAM
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T12%3A51%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RowClone:%20Accelerating%20Data%20Movement%20and%20Initialization%20Using%20DRAM&rft.au=Seshadri,%20Vivek&rft.date=2018-05-07&rft_id=info:doi/10.48550/arxiv.1805.03502&rft_dat=%3Carxiv_GOX%3E1805_03502%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true