ContextPreRF: Enhancing the Performance and Energy of GPUs With Nonuniform Register Access

Register files are a key data storage unit that impacts instruction throughput for graphics processing units (GPUs). Typically, GPU register files are quite large to accommodate many concurrent threads and are implemented using the same SRAM technology as the on-chip cache. We propose contextrf, a n...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on very large scale integration (VLSI) systems 2016-01, Vol.24 (1), p.343-347
Hauptverfasser:	Moeng, Michael, Haifeng Xu, Melhem, Rami, Jones, Alex K.
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer architecture Context Domain wall memory GPU Graphics processing units Hides hybrid memory Instruction sets Nonuniform Performance enhancement Performance evaluation Random access memory register file Registers Static random access memory Switches Switching Very large scale integration
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	347
container_issue	1
container_start_page	343
container_title	IEEE transactions on very large scale integration (VLSI) systems
container_volume	24
creator	Moeng, Michael Haifeng Xu Melhem, Rami Jones, Alex K.
description	Register files are a key data storage unit that impacts instruction throughput for graphics processing units (GPUs). Typically, GPU register files are quite large to accommodate many concurrent threads and are implemented using the same SRAM technology as the on-chip cache. We propose contextrf, a new register file architecture that efficiently leverages register files with nonuniform access characteristics, including hybrid SRAM/DRAM (S/D) and spintronic domain-wall memories (DWMs). Contextrf allows greater-capacity register files to be implemented in the same area within the GPU, with reduced power consumption. We also propose contextPreRF, a hardware preswitch scheme to hide switching delays-as soon as a register request is queued, the nonuniform access memories containing the corresponding register are sent a preemptive switch request. Thus, our scheme transparently hides the penalties of switching between register contexts. After replacing the register file SRAM with S/D, we can reduce energy by 37%, with a 1.4% average performance drop. Employing DWM, we reduce register file energy by 74%, with a 0.4% average performance penalty. For the denser DWM, we model converting the saved area into additional registers, cache, and shared memory-this improves performance by 13.5% over the baseline SRAM register file.
doi_str_mv	10.1109/TVLSI.2015.2397876
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_1752040186</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7070716</ieee_id><sourcerecordid>1786199546</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-9fbebda523e5dea3c2ff699a08334eebc70eabcd49e796aa8d2cbde0ea8fa5503</originalsourceid><addsrcrecordid>eNpdkE1PwkAQhhujiYj-Ab1s4sVLcT-6bdcbIYAkRAmCJl6a7XYWSmCLu20i_96tEA_OHGYy87yTyRsEtwT3CMHicfE-fZv0KCa8R5lI0iQ-CzqE8yQUPs59j2MWppTgy-DKuQ3GJIoE7gSfg8rU8F3PLMxHT2ho1tKo0qxQvQY0A6sru_MTQNIUfgt2dUCVRuPZ0qGPsl6jl8o0pmwxNIdV6WqwqK8UOHcdXGi5dXBzqt1gORouBs_h9HU8GfSnoWI0rUOhc8gLySkDXoBkimodCyFxylgEkKsEg8xVEQlIRCxlWlCVF-CHqZacY9YNHo5397b6asDV2a50CrZbaaBqXEaSNCZC8Cj26P0_dFM11vjvPMUpjjBJW4oeKWUr5yzobG_LnbSHjOCstTv7tTtr7c5OdnvR3VFUAsCfIME-Scx-AOogfPg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1752040186</pqid></control><display><type>article</type><title>ContextPreRF: Enhancing the Performance and Energy of GPUs With Nonuniform Register Access</title><source>IEEE Electronic Library (IEL)</source><creator>Moeng, Michael ; Haifeng Xu ; Melhem, Rami ; Jones, Alex K.</creator><creatorcontrib>Moeng, Michael ; Haifeng Xu ; Melhem, Rami ; Jones, Alex K.</creatorcontrib><description>Register files are a key data storage unit that impacts instruction throughput for graphics processing units (GPUs). Typically, GPU register files are quite large to accommodate many concurrent threads and are implemented using the same SRAM technology as the on-chip cache. We propose contextrf, a new register file architecture that efficiently leverages register files with nonuniform access characteristics, including hybrid SRAM/DRAM (S/D) and spintronic domain-wall memories (DWMs). Contextrf allows greater-capacity register files to be implemented in the same area within the GPU, with reduced power consumption. We also propose contextPreRF, a hardware preswitch scheme to hide switching delays-as soon as a register request is queued, the nonuniform access memories containing the corresponding register are sent a preemptive switch request. Thus, our scheme transparently hides the penalties of switching between register contexts. After replacing the register file SRAM with S/D, we can reduce energy by 37%, with a 1.4% average performance drop. Employing DWM, we reduce register file energy by 74%, with a 0.4% average performance penalty. For the denser DWM, we model converting the saved area into additional registers, cache, and shared memory-this improves performance by 13.5% over the baseline SRAM register file.</description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2015.2397876</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Computer architecture ; Context ; Domain wall memory ; GPU ; Graphics processing units ; Hides ; hybrid memory ; Instruction sets ; Nonuniform ; Performance enhancement ; Performance evaluation ; Random access memory ; register file ; Registers ; Static random access memory ; Switches ; Switching ; Very large scale integration</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2016-01, Vol.24 (1), p.343-347</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-9fbebda523e5dea3c2ff699a08334eebc70eabcd49e796aa8d2cbde0ea8fa5503</citedby><cites>FETCH-LOGICAL-c328t-9fbebda523e5dea3c2ff699a08334eebc70eabcd49e796aa8d2cbde0ea8fa5503</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7070716$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7070716$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Moeng, Michael</creatorcontrib><creatorcontrib>Haifeng Xu</creatorcontrib><creatorcontrib>Melhem, Rami</creatorcontrib><creatorcontrib>Jones, Alex K.</creatorcontrib><title>ContextPreRF: Enhancing the Performance and Energy of GPUs With Nonuniform Register Access</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description>Register files are a key data storage unit that impacts instruction throughput for graphics processing units (GPUs). Typically, GPU register files are quite large to accommodate many concurrent threads and are implemented using the same SRAM technology as the on-chip cache. We propose contextrf, a new register file architecture that efficiently leverages register files with nonuniform access characteristics, including hybrid SRAM/DRAM (S/D) and spintronic domain-wall memories (DWMs). Contextrf allows greater-capacity register files to be implemented in the same area within the GPU, with reduced power consumption. We also propose contextPreRF, a hardware preswitch scheme to hide switching delays-as soon as a register request is queued, the nonuniform access memories containing the corresponding register are sent a preemptive switch request. Thus, our scheme transparently hides the penalties of switching between register contexts. After replacing the register file SRAM with S/D, we can reduce energy by 37%, with a 1.4% average performance drop. Employing DWM, we reduce register file energy by 74%, with a 0.4% average performance penalty. For the denser DWM, we model converting the saved area into additional registers, cache, and shared memory-this improves performance by 13.5% over the baseline SRAM register file.</description><subject>Computer architecture</subject><subject>Context</subject><subject>Domain wall memory</subject><subject>GPU</subject><subject>Graphics processing units</subject><subject>Hides</subject><subject>hybrid memory</subject><subject>Instruction sets</subject><subject>Nonuniform</subject><subject>Performance enhancement</subject><subject>Performance evaluation</subject><subject>Random access memory</subject><subject>register file</subject><subject>Registers</subject><subject>Static random access memory</subject><subject>Switches</subject><subject>Switching</subject><subject>Very large scale integration</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1PwkAQhhujiYj-Ab1s4sVLcT-6bdcbIYAkRAmCJl6a7XYWSmCLu20i_96tEA_OHGYy87yTyRsEtwT3CMHicfE-fZv0KCa8R5lI0iQ-CzqE8yQUPs59j2MWppTgy-DKuQ3GJIoE7gSfg8rU8F3PLMxHT2ho1tKo0qxQvQY0A6sru_MTQNIUfgt2dUCVRuPZ0qGPsl6jl8o0pmwxNIdV6WqwqK8UOHcdXGi5dXBzqt1gORouBs_h9HU8GfSnoWI0rUOhc8gLySkDXoBkimodCyFxylgEkKsEg8xVEQlIRCxlWlCVF-CHqZacY9YNHo5397b6asDV2a50CrZbaaBqXEaSNCZC8Cj26P0_dFM11vjvPMUpjjBJW4oeKWUr5yzobG_LnbSHjOCstTv7tTtr7c5OdnvR3VFUAsCfIME-Scx-AOogfPg</recordid><startdate>201601</startdate><enddate>201601</enddate><creator>Moeng, Michael</creator><creator>Haifeng Xu</creator><creator>Melhem, Rami</creator><creator>Jones, Alex K.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>201601</creationdate><title>ContextPreRF: Enhancing the Performance and Energy of GPUs With Nonuniform Register Access</title><author>Moeng, Michael ; Haifeng Xu ; Melhem, Rami ; Jones, Alex K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-9fbebda523e5dea3c2ff699a08334eebc70eabcd49e796aa8d2cbde0ea8fa5503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Computer architecture</topic><topic>Context</topic><topic>Domain wall memory</topic><topic>GPU</topic><topic>Graphics processing units</topic><topic>Hides</topic><topic>hybrid memory</topic><topic>Instruction sets</topic><topic>Nonuniform</topic><topic>Performance enhancement</topic><topic>Performance evaluation</topic><topic>Random access memory</topic><topic>register file</topic><topic>Registers</topic><topic>Static random access memory</topic><topic>Switches</topic><topic>Switching</topic><topic>Very large scale integration</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Moeng, Michael</creatorcontrib><creatorcontrib>Haifeng Xu</creatorcontrib><creatorcontrib>Melhem, Rami</creatorcontrib><creatorcontrib>Jones, Alex K.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Moeng, Michael</au><au>Haifeng Xu</au><au>Melhem, Rami</au><au>Jones, Alex K.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ContextPreRF: Enhancing the Performance and Energy of GPUs With Nonuniform Register Access</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2016-01</date><risdate>2016</risdate><volume>24</volume><issue>1</issue><spage>343</spage><epage>347</epage><pages>343-347</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract>Register files are a key data storage unit that impacts instruction throughput for graphics processing units (GPUs). Typically, GPU register files are quite large to accommodate many concurrent threads and are implemented using the same SRAM technology as the on-chip cache. We propose contextrf, a new register file architecture that efficiently leverages register files with nonuniform access characteristics, including hybrid SRAM/DRAM (S/D) and spintronic domain-wall memories (DWMs). Contextrf allows greater-capacity register files to be implemented in the same area within the GPU, with reduced power consumption. We also propose contextPreRF, a hardware preswitch scheme to hide switching delays-as soon as a register request is queued, the nonuniform access memories containing the corresponding register are sent a preemptive switch request. Thus, our scheme transparently hides the penalties of switching between register contexts. After replacing the register file SRAM with S/D, we can reduce energy by 37%, with a 1.4% average performance drop. Employing DWM, we reduce register file energy by 74%, with a 0.4% average performance penalty. For the denser DWM, we model converting the saved area into additional registers, cache, and shared memory-this improves performance by 13.5% over the baseline SRAM register file.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TVLSI.2015.2397876</doi><tpages>5</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1063-8210
ispartof	IEEE transactions on very large scale integration (VLSI) systems, 2016-01, Vol.24 (1), p.343-347
issn	1063-8210 1557-9999
language	eng
recordid	cdi_proquest_journals_1752040186
source	IEEE Electronic Library (IEL)
subjects	Computer architecture Context Domain wall memory GPU Graphics processing units Hides hybrid memory Instruction sets Nonuniform Performance enhancement Performance evaluation Random access memory register file Registers Static random access memory Switches Switching Very large scale integration
title	ContextPreRF: Enhancing the Performance and Energy of GPUs With Nonuniform Register Access
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T11%3A34%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ContextPreRF:%20Enhancing%20the%20Performance%20and%20Energy%20of%20GPUs%20With%20Nonuniform%20Register%20Access&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Moeng,%20Michael&rft.date=2016-01&rft.volume=24&rft.issue=1&rft.spage=343&rft.epage=347&rft.pages=343-347&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2015.2397876&rft_dat=%3Cproquest_RIE%3E1786199546%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1752040186&rft_id=info:pmid/&rft_ieee_id=7070716&rfr_iscdi=true