Architecting the Last-Level Cache for GPUs using STT-RAM Technology

Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, w...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on design automation of electronic systems 2015-09, Vol.20 (4), p.1-24
Hauptverfasser:	Samavatian, Mohammad Hossein, Arjomand, Mohammad, Bashizade, Ramin, Sarbazi-Azad, Hamid
Format:	Artikel
Sprache:	eng
Schlagworte:	Architecture Automation Delay High density Interprocessor communication Power consumption Searching Thresholds
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	24
container_issue	4
container_start_page	1
container_title	ACM transactions on design automation of electronic systems
container_volume	20
creator	Samavatian, Mohammad Hossein Arjomand, Mohammad Bashizade, Ramin Sarbazi-Azad, Hamid
description	Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limited area for on-chip memory banks. They have, however, two important issues, high energy and latency of write operations, that have to be addressed. Low retention time STT-RAMs can reduce the energy and delay of write operations. Nevertheless, employing STT-RAMs with low retention time in GPUs requires a thorough study on the behavior of GPGPU applications. Based on this investigation, we have architectured a two-part STT-RAM-based L2 cache with low-retention (LR) and high-retention (HR) parts. The proposed two-part L2 cache exploits a dynamic threshold regulator (DTR) to efficiently regulate the write threshold for migration of the data blocks from HR to LR, based on the behavior of the applications. Also, a Data and Access type Aware Cache Search mechanism (DAACS) is hired for handling the search of the requested data blocks in two parts of the cache. The STT-RAM L2 cache architecture proposed in this article can improve IPC by up to 171% (20% on average), and reduce the average consumed power by 28.9% compared to a conventional L2 cache architecture with equal on-chip area.
doi_str_mv	10.1145/2764905
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1770334923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1770334923</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-83ab59eebc03e838aa94eb45cc7b42bb1984434d934cb2052d744bba8dad7d183</originalsourceid><addsrcrecordid>eNotkFFLwzAUhYMoOKf4F_qmL9GkuVmSx1J0ChVFu-eSpHdbpWtn0g327-3Yns7H4eM8HELuOXviHORzqmZgmLwgEy6lokowczky00Bh5GtyE-MvY0yqmZyQPAt-3Qzoh6ZbJcMak8LGgRa4xzbJrR-LZR-S-dciJrt4dH7Kkn5nH0mJft31bb863JKrpW0j3p1zShavL2X-RovP-XueFdSnUg9UC-ukQXSeCdRCW2sAHUjvlYPUOW40gIDaCPAuZTKtFYBzVte2VjXXYkoeT7vb0P_tMA7Vpoke29Z22O9ixZViQoBJxag-nFQf-hgDLqttaDY2HCrOquNN1fkm8Q8J_Ffb</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1770334923</pqid></control><display><type>article</type><title>Architecting the Last-Level Cache for GPUs using STT-RAM Technology</title><source>ACM Digital Library Complete</source><creator>Samavatian, Mohammad Hossein ; Arjomand, Mohammad ; Bashizade, Ramin ; Sarbazi-Azad, Hamid</creator><creatorcontrib>Samavatian, Mohammad Hossein ; Arjomand, Mohammad ; Bashizade, Ramin ; Sarbazi-Azad, Hamid</creatorcontrib><description>Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limited area for on-chip memory banks. They have, however, two important issues, high energy and latency of write operations, that have to be addressed. Low retention time STT-RAMs can reduce the energy and delay of write operations. Nevertheless, employing STT-RAMs with low retention time in GPUs requires a thorough study on the behavior of GPGPU applications. Based on this investigation, we have architectured a two-part STT-RAM-based L2 cache with low-retention (LR) and high-retention (HR) parts. The proposed two-part L2 cache exploits a dynamic threshold regulator (DTR) to efficiently regulate the write threshold for migration of the data blocks from HR to LR, based on the behavior of the applications. Also, a Data and Access type Aware Cache Search mechanism (DAACS) is hired for handling the search of the requested data blocks in two parts of the cache. The STT-RAM L2 cache architecture proposed in this article can improve IPC by up to 171% (20% on average), and reduce the average consumed power by 28.9% compared to a conventional L2 cache architecture with equal on-chip area.</description><identifier>ISSN: 1084-4309</identifier><identifier>EISSN: 1557-7309</identifier><identifier>DOI: 10.1145/2764905</identifier><language>eng</language><subject>Architecture ; Automation ; Delay ; High density ; Interprocessor communication ; Power consumption ; Searching ; Thresholds</subject><ispartof>ACM transactions on design automation of electronic systems, 2015-09, Vol.20 (4), p.1-24</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-83ab59eebc03e838aa94eb45cc7b42bb1984434d934cb2052d744bba8dad7d183</citedby><cites>FETCH-LOGICAL-c258t-83ab59eebc03e838aa94eb45cc7b42bb1984434d934cb2052d744bba8dad7d183</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids></links><search><creatorcontrib>Samavatian, Mohammad Hossein</creatorcontrib><creatorcontrib>Arjomand, Mohammad</creatorcontrib><creatorcontrib>Bashizade, Ramin</creatorcontrib><creatorcontrib>Sarbazi-Azad, Hamid</creatorcontrib><title>Architecting the Last-Level Cache for GPUs using STT-RAM Technology</title><title>ACM transactions on design automation of electronic systems</title><description>Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limited area for on-chip memory banks. They have, however, two important issues, high energy and latency of write operations, that have to be addressed. Low retention time STT-RAMs can reduce the energy and delay of write operations. Nevertheless, employing STT-RAMs with low retention time in GPUs requires a thorough study on the behavior of GPGPU applications. Based on this investigation, we have architectured a two-part STT-RAM-based L2 cache with low-retention (LR) and high-retention (HR) parts. The proposed two-part L2 cache exploits a dynamic threshold regulator (DTR) to efficiently regulate the write threshold for migration of the data blocks from HR to LR, based on the behavior of the applications. Also, a Data and Access type Aware Cache Search mechanism (DAACS) is hired for handling the search of the requested data blocks in two parts of the cache. The STT-RAM L2 cache architecture proposed in this article can improve IPC by up to 171% (20% on average), and reduce the average consumed power by 28.9% compared to a conventional L2 cache architecture with equal on-chip area.</description><subject>Architecture</subject><subject>Automation</subject><subject>Delay</subject><subject>High density</subject><subject>Interprocessor communication</subject><subject>Power consumption</subject><subject>Searching</subject><subject>Thresholds</subject><issn>1084-4309</issn><issn>1557-7309</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNotkFFLwzAUhYMoOKf4F_qmL9GkuVmSx1J0ChVFu-eSpHdbpWtn0g327-3Yns7H4eM8HELuOXviHORzqmZgmLwgEy6lokowczky00Bh5GtyE-MvY0yqmZyQPAt-3Qzoh6ZbJcMak8LGgRa4xzbJrR-LZR-S-dciJrt4dH7Kkn5nH0mJft31bb863JKrpW0j3p1zShavL2X-RovP-XueFdSnUg9UC-ukQXSeCdRCW2sAHUjvlYPUOW40gIDaCPAuZTKtFYBzVte2VjXXYkoeT7vb0P_tMA7Vpoke29Z22O9ixZViQoBJxag-nFQf-hgDLqttaDY2HCrOquNN1fkm8Q8J_Ffb</recordid><startdate>20150901</startdate><enddate>20150901</enddate><creator>Samavatian, Mohammad Hossein</creator><creator>Arjomand, Mohammad</creator><creator>Bashizade, Ramin</creator><creator>Sarbazi-Azad, Hamid</creator><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20150901</creationdate><title>Architecting the Last-Level Cache for GPUs using STT-RAM Technology</title><author>Samavatian, Mohammad Hossein ; Arjomand, Mohammad ; Bashizade, Ramin ; Sarbazi-Azad, Hamid</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-83ab59eebc03e838aa94eb45cc7b42bb1984434d934cb2052d744bba8dad7d183</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Architecture</topic><topic>Automation</topic><topic>Delay</topic><topic>High density</topic><topic>Interprocessor communication</topic><topic>Power consumption</topic><topic>Searching</topic><topic>Thresholds</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Samavatian, Mohammad Hossein</creatorcontrib><creatorcontrib>Arjomand, Mohammad</creatorcontrib><creatorcontrib>Bashizade, Ramin</creatorcontrib><creatorcontrib>Sarbazi-Azad, Hamid</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM transactions on design automation of electronic systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Samavatian, Mohammad Hossein</au><au>Arjomand, Mohammad</au><au>Bashizade, Ramin</au><au>Sarbazi-Azad, Hamid</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Architecting the Last-Level Cache for GPUs using STT-RAM Technology</atitle><jtitle>ACM transactions on design automation of electronic systems</jtitle><date>2015-09-01</date><risdate>2015</risdate><volume>20</volume><issue>4</issue><spage>1</spage><epage>24</epage><pages>1-24</pages><issn>1084-4309</issn><eissn>1557-7309</eissn><abstract>Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limited area for on-chip memory banks. They have, however, two important issues, high energy and latency of write operations, that have to be addressed. Low retention time STT-RAMs can reduce the energy and delay of write operations. Nevertheless, employing STT-RAMs with low retention time in GPUs requires a thorough study on the behavior of GPGPU applications. Based on this investigation, we have architectured a two-part STT-RAM-based L2 cache with low-retention (LR) and high-retention (HR) parts. The proposed two-part L2 cache exploits a dynamic threshold regulator (DTR) to efficiently regulate the write threshold for migration of the data blocks from HR to LR, based on the behavior of the applications. Also, a Data and Access type Aware Cache Search mechanism (DAACS) is hired for handling the search of the requested data blocks in two parts of the cache. The STT-RAM L2 cache architecture proposed in this article can improve IPC by up to 171% (20% on average), and reduce the average consumed power by 28.9% compared to a conventional L2 cache architecture with equal on-chip area.</abstract><doi>10.1145/2764905</doi><tpages>24</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1084-4309
ispartof	ACM transactions on design automation of electronic systems, 2015-09, Vol.20 (4), p.1-24
issn	1084-4309 1557-7309
language	eng
recordid	cdi_proquest_miscellaneous_1770334923
source	ACM Digital Library Complete
subjects	Architecture Automation Delay High density Interprocessor communication Power consumption Searching Thresholds
title	Architecting the Last-Level Cache for GPUs using STT-RAM Technology
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T00%3A19%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Architecting%20the%20Last-Level%20Cache%20for%20GPUs%20using%20STT-RAM%20Technology&rft.jtitle=ACM%20transactions%20on%20design%20automation%20of%20electronic%20systems&rft.au=Samavatian,%20Mohammad%20Hossein&rft.date=2015-09-01&rft.volume=20&rft.issue=4&rft.spage=1&rft.epage=24&rft.pages=1-24&rft.issn=1084-4309&rft.eissn=1557-7309&rft_id=info:doi/10.1145/2764905&rft_dat=%3Cproquest_cross%3E1770334923%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1770334923&rft_id=info:pmid/&rfr_iscdi=true