RowClone: Accelerating Data Movement and Initialization Using DRAM
In existing systems, to perform any bulk data movement operation (copy or initialization), the data has to first be read into the on-chip processor, all the way into the L1 cache, and the result of the operation must be written back to main memory. This is despite the fact that these operations do n...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Seshadri, Vivek Kim, Yoongu Fallin, Chris Lee, Donghyuk Ausavarungnirun, Rachata Pekhimenko, Gennady Luo, Yixin Mutlu, Onur Gibbons, Phillip B Kozuch, Michael A Mowry, Todd C |
description | In existing systems, to perform any bulk data movement operation (copy or
initialization), the data has to first be read into the on-chip processor, all
the way into the L1 cache, and the result of the operation must be written back
to main memory. This is despite the fact that these operations do not involve
any actual computation. RowClone exploits the organization and operation of
commodity DRAM to perform these operations completely inside DRAM using two
mechanisms. The first mechanism, Fast Parallel Mode, copies data between two
rows inside the same DRAM subarray by issuing back-to-back activate commands to
the source and the destination row. The second mechanism, Pipelined Serial
Mode, transfers cache lines between two banks using the shared internal bus.
RowClone significantly reduces the raw latency and energy consumption of bulk
data copy and initialization. This reduction directly translates to improvement
in performance and energy efficiency of systems running copy or
initialization-intensive workloads |
doi_str_mv | 10.48550/arxiv.1805.03502 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1805_03502</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1805_03502</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-ff67f2c15389a033ed21077e53b031a04a342ba305e9fdb6d34b0cbd99b53b0f3</originalsourceid><addsrcrecordid>eNotz8tOwzAQhWFvWKDCA7DCL5Aw9sS5sAvhVqkVUtWuo3E8riylDkqjcnl6aGB1Np-O9AtxoyDNSmPgjsbPcEpVCSYFNKAvxcNm-Gj6IfK9rLuOex5pCnEvH2kiuR5OfOA4SYpOLmOYAvXh-xcMUe6OM9vU6ytx4ak_8vX_LsT2-WnbvCart5dlU68SygudeJ8XXnfKYFkRILLTCoqCDVpARZARZtoSguHKO5s7zCx01lWVPROPC3H7dztHtO9jOND41Z5j2jkGfwAu1EO6</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>RowClone: Accelerating Data Movement and Initialization Using DRAM</title><source>arXiv.org</source><creator>Seshadri, Vivek ; Kim, Yoongu ; Fallin, Chris ; Lee, Donghyuk ; Ausavarungnirun, Rachata ; Pekhimenko, Gennady ; Luo, Yixin ; Mutlu, Onur ; Gibbons, Phillip B ; Kozuch, Michael A ; Mowry, Todd C</creator><creatorcontrib>Seshadri, Vivek ; Kim, Yoongu ; Fallin, Chris ; Lee, Donghyuk ; Ausavarungnirun, Rachata ; Pekhimenko, Gennady ; Luo, Yixin ; Mutlu, Onur ; Gibbons, Phillip B ; Kozuch, Michael A ; Mowry, Todd C</creatorcontrib><description>In existing systems, to perform any bulk data movement operation (copy or
initialization), the data has to first be read into the on-chip processor, all
the way into the L1 cache, and the result of the operation must be written back
to main memory. This is despite the fact that these operations do not involve
any actual computation. RowClone exploits the organization and operation of
commodity DRAM to perform these operations completely inside DRAM using two
mechanisms. The first mechanism, Fast Parallel Mode, copies data between two
rows inside the same DRAM subarray by issuing back-to-back activate commands to
the source and the destination row. The second mechanism, Pipelined Serial
Mode, transfers cache lines between two banks using the shared internal bus.
RowClone significantly reduces the raw latency and energy consumption of bulk
data copy and initialization. This reduction directly translates to improvement
in performance and energy efficiency of systems running copy or
initialization-intensive workloads</description><identifier>DOI: 10.48550/arxiv.1805.03502</identifier><language>eng</language><subject>Computer Science - Hardware Architecture</subject><creationdate>2018-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1805.03502$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1805.03502$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Seshadri, Vivek</creatorcontrib><creatorcontrib>Kim, Yoongu</creatorcontrib><creatorcontrib>Fallin, Chris</creatorcontrib><creatorcontrib>Lee, Donghyuk</creatorcontrib><creatorcontrib>Ausavarungnirun, Rachata</creatorcontrib><creatorcontrib>Pekhimenko, Gennady</creatorcontrib><creatorcontrib>Luo, Yixin</creatorcontrib><creatorcontrib>Mutlu, Onur</creatorcontrib><creatorcontrib>Gibbons, Phillip B</creatorcontrib><creatorcontrib>Kozuch, Michael A</creatorcontrib><creatorcontrib>Mowry, Todd C</creatorcontrib><title>RowClone: Accelerating Data Movement and Initialization Using DRAM</title><description>In existing systems, to perform any bulk data movement operation (copy or
initialization), the data has to first be read into the on-chip processor, all
the way into the L1 cache, and the result of the operation must be written back
to main memory. This is despite the fact that these operations do not involve
any actual computation. RowClone exploits the organization and operation of
commodity DRAM to perform these operations completely inside DRAM using two
mechanisms. The first mechanism, Fast Parallel Mode, copies data between two
rows inside the same DRAM subarray by issuing back-to-back activate commands to
the source and the destination row. The second mechanism, Pipelined Serial
Mode, transfers cache lines between two banks using the shared internal bus.
RowClone significantly reduces the raw latency and energy consumption of bulk
data copy and initialization. This reduction directly translates to improvement
in performance and energy efficiency of systems running copy or
initialization-intensive workloads</description><subject>Computer Science - Hardware Architecture</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8tOwzAQhWFvWKDCA7DCL5Aw9sS5sAvhVqkVUtWuo3E8riylDkqjcnl6aGB1Np-O9AtxoyDNSmPgjsbPcEpVCSYFNKAvxcNm-Gj6IfK9rLuOex5pCnEvH2kiuR5OfOA4SYpOLmOYAvXh-xcMUe6OM9vU6ytx4ak_8vX_LsT2-WnbvCart5dlU68SygudeJ8XXnfKYFkRILLTCoqCDVpARZARZtoSguHKO5s7zCx01lWVPROPC3H7dztHtO9jOND41Z5j2jkGfwAu1EO6</recordid><startdate>20180507</startdate><enddate>20180507</enddate><creator>Seshadri, Vivek</creator><creator>Kim, Yoongu</creator><creator>Fallin, Chris</creator><creator>Lee, Donghyuk</creator><creator>Ausavarungnirun, Rachata</creator><creator>Pekhimenko, Gennady</creator><creator>Luo, Yixin</creator><creator>Mutlu, Onur</creator><creator>Gibbons, Phillip B</creator><creator>Kozuch, Michael A</creator><creator>Mowry, Todd C</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20180507</creationdate><title>RowClone: Accelerating Data Movement and Initialization Using DRAM</title><author>Seshadri, Vivek ; Kim, Yoongu ; Fallin, Chris ; Lee, Donghyuk ; Ausavarungnirun, Rachata ; Pekhimenko, Gennady ; Luo, Yixin ; Mutlu, Onur ; Gibbons, Phillip B ; Kozuch, Michael A ; Mowry, Todd C</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-ff67f2c15389a033ed21077e53b031a04a342ba305e9fdb6d34b0cbd99b53b0f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer Science - Hardware Architecture</topic><toplevel>online_resources</toplevel><creatorcontrib>Seshadri, Vivek</creatorcontrib><creatorcontrib>Kim, Yoongu</creatorcontrib><creatorcontrib>Fallin, Chris</creatorcontrib><creatorcontrib>Lee, Donghyuk</creatorcontrib><creatorcontrib>Ausavarungnirun, Rachata</creatorcontrib><creatorcontrib>Pekhimenko, Gennady</creatorcontrib><creatorcontrib>Luo, Yixin</creatorcontrib><creatorcontrib>Mutlu, Onur</creatorcontrib><creatorcontrib>Gibbons, Phillip B</creatorcontrib><creatorcontrib>Kozuch, Michael A</creatorcontrib><creatorcontrib>Mowry, Todd C</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Seshadri, Vivek</au><au>Kim, Yoongu</au><au>Fallin, Chris</au><au>Lee, Donghyuk</au><au>Ausavarungnirun, Rachata</au><au>Pekhimenko, Gennady</au><au>Luo, Yixin</au><au>Mutlu, Onur</au><au>Gibbons, Phillip B</au><au>Kozuch, Michael A</au><au>Mowry, Todd C</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RowClone: Accelerating Data Movement and Initialization Using DRAM</atitle><date>2018-05-07</date><risdate>2018</risdate><abstract>In existing systems, to perform any bulk data movement operation (copy or
initialization), the data has to first be read into the on-chip processor, all
the way into the L1 cache, and the result of the operation must be written back
to main memory. This is despite the fact that these operations do not involve
any actual computation. RowClone exploits the organization and operation of
commodity DRAM to perform these operations completely inside DRAM using two
mechanisms. The first mechanism, Fast Parallel Mode, copies data between two
rows inside the same DRAM subarray by issuing back-to-back activate commands to
the source and the destination row. The second mechanism, Pipelined Serial
Mode, transfers cache lines between two banks using the shared internal bus.
RowClone significantly reduces the raw latency and energy consumption of bulk
data copy and initialization. This reduction directly translates to improvement
in performance and energy efficiency of systems running copy or
initialization-intensive workloads</abstract><doi>10.48550/arxiv.1805.03502</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1805.03502 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1805_03502 |
source | arXiv.org |
subjects | Computer Science - Hardware Architecture |
title | RowClone: Accelerating Data Movement and Initialization Using DRAM |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T12%3A51%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RowClone:%20Accelerating%20Data%20Movement%20and%20Initialization%20Using%20DRAM&rft.au=Seshadri,%20Vivek&rft.date=2018-05-07&rft_id=info:doi/10.48550/arxiv.1805.03502&rft_dat=%3Carxiv_GOX%3E1805_03502%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |