Architecting the Last-Level Cache for GPUs using STT-RAM Technology

Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, w...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on design automation of electronic systems 2015-09, Vol.20 (4), p.1-24
Hauptverfasser: Samavatian, Mohammad Hossein, Arjomand, Mohammad, Bashizade, Ramin, Sarbazi-Azad, Hamid
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 24
container_issue 4
container_start_page 1
container_title ACM transactions on design automation of electronic systems
container_volume 20
creator Samavatian, Mohammad Hossein
Arjomand, Mohammad
Bashizade, Ramin
Sarbazi-Azad, Hamid
description Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limited area for on-chip memory banks. They have, however, two important issues, high energy and latency of write operations, that have to be addressed. Low retention time STT-RAMs can reduce the energy and delay of write operations. Nevertheless, employing STT-RAMs with low retention time in GPUs requires a thorough study on the behavior of GPGPU applications. Based on this investigation, we have architectured a two-part STT-RAM-based L2 cache with low-retention (LR) and high-retention (HR) parts. The proposed two-part L2 cache exploits a dynamic threshold regulator (DTR) to efficiently regulate the write threshold for migration of the data blocks from HR to LR, based on the behavior of the applications. Also, a Data and Access type Aware Cache Search mechanism (DAACS) is hired for handling the search of the requested data blocks in two parts of the cache. The STT-RAM L2 cache architecture proposed in this article can improve IPC by up to 171% (20% on average), and reduce the average consumed power by 28.9% compared to a conventional L2 cache architecture with equal on-chip area.
doi_str_mv 10.1145/2764905
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1770334923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1770334923</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-83ab59eebc03e838aa94eb45cc7b42bb1984434d934cb2052d744bba8dad7d183</originalsourceid><addsrcrecordid>eNotkFFLwzAUhYMoOKf4F_qmL9GkuVmSx1J0ChVFu-eSpHdbpWtn0g327-3Yns7H4eM8HELuOXviHORzqmZgmLwgEy6lokowczky00Bh5GtyE-MvY0yqmZyQPAt-3Qzoh6ZbJcMak8LGgRa4xzbJrR-LZR-S-dciJrt4dH7Kkn5nH0mJft31bb863JKrpW0j3p1zShavL2X-RovP-XueFdSnUg9UC-ukQXSeCdRCW2sAHUjvlYPUOW40gIDaCPAuZTKtFYBzVte2VjXXYkoeT7vb0P_tMA7Vpoke29Z22O9ixZViQoBJxag-nFQf-hgDLqttaDY2HCrOquNN1fkm8Q8J_Ffb</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1770334923</pqid></control><display><type>article</type><title>Architecting the Last-Level Cache for GPUs using STT-RAM Technology</title><source>ACM Digital Library Complete</source><creator>Samavatian, Mohammad Hossein ; Arjomand, Mohammad ; Bashizade, Ramin ; Sarbazi-Azad, Hamid</creator><creatorcontrib>Samavatian, Mohammad Hossein ; Arjomand, Mohammad ; Bashizade, Ramin ; Sarbazi-Azad, Hamid</creatorcontrib><description>Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limited area for on-chip memory banks. They have, however, two important issues, high energy and latency of write operations, that have to be addressed. Low retention time STT-RAMs can reduce the energy and delay of write operations. Nevertheless, employing STT-RAMs with low retention time in GPUs requires a thorough study on the behavior of GPGPU applications. Based on this investigation, we have architectured a two-part STT-RAM-based L2 cache with low-retention (LR) and high-retention (HR) parts. The proposed two-part L2 cache exploits a dynamic threshold regulator (DTR) to efficiently regulate the write threshold for migration of the data blocks from HR to LR, based on the behavior of the applications. Also, a Data and Access type Aware Cache Search mechanism (DAACS) is hired for handling the search of the requested data blocks in two parts of the cache. The STT-RAM L2 cache architecture proposed in this article can improve IPC by up to 171% (20% on average), and reduce the average consumed power by 28.9% compared to a conventional L2 cache architecture with equal on-chip area.</description><identifier>ISSN: 1084-4309</identifier><identifier>EISSN: 1557-7309</identifier><identifier>DOI: 10.1145/2764905</identifier><language>eng</language><subject>Architecture ; Automation ; Delay ; High density ; Interprocessor communication ; Power consumption ; Searching ; Thresholds</subject><ispartof>ACM transactions on design automation of electronic systems, 2015-09, Vol.20 (4), p.1-24</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-83ab59eebc03e838aa94eb45cc7b42bb1984434d934cb2052d744bba8dad7d183</citedby><cites>FETCH-LOGICAL-c258t-83ab59eebc03e838aa94eb45cc7b42bb1984434d934cb2052d744bba8dad7d183</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids></links><search><creatorcontrib>Samavatian, Mohammad Hossein</creatorcontrib><creatorcontrib>Arjomand, Mohammad</creatorcontrib><creatorcontrib>Bashizade, Ramin</creatorcontrib><creatorcontrib>Sarbazi-Azad, Hamid</creatorcontrib><title>Architecting the Last-Level Cache for GPUs using STT-RAM Technology</title><title>ACM transactions on design automation of electronic systems</title><description>Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limited area for on-chip memory banks. They have, however, two important issues, high energy and latency of write operations, that have to be addressed. Low retention time STT-RAMs can reduce the energy and delay of write operations. Nevertheless, employing STT-RAMs with low retention time in GPUs requires a thorough study on the behavior of GPGPU applications. Based on this investigation, we have architectured a two-part STT-RAM-based L2 cache with low-retention (LR) and high-retention (HR) parts. The proposed two-part L2 cache exploits a dynamic threshold regulator (DTR) to efficiently regulate the write threshold for migration of the data blocks from HR to LR, based on the behavior of the applications. Also, a Data and Access type Aware Cache Search mechanism (DAACS) is hired for handling the search of the requested data blocks in two parts of the cache. The STT-RAM L2 cache architecture proposed in this article can improve IPC by up to 171% (20% on average), and reduce the average consumed power by 28.9% compared to a conventional L2 cache architecture with equal on-chip area.</description><subject>Architecture</subject><subject>Automation</subject><subject>Delay</subject><subject>High density</subject><subject>Interprocessor communication</subject><subject>Power consumption</subject><subject>Searching</subject><subject>Thresholds</subject><issn>1084-4309</issn><issn>1557-7309</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNotkFFLwzAUhYMoOKf4F_qmL9GkuVmSx1J0ChVFu-eSpHdbpWtn0g327-3Yns7H4eM8HELuOXviHORzqmZgmLwgEy6lokowczky00Bh5GtyE-MvY0yqmZyQPAt-3Qzoh6ZbJcMak8LGgRa4xzbJrR-LZR-S-dciJrt4dH7Kkn5nH0mJft31bb863JKrpW0j3p1zShavL2X-RovP-XueFdSnUg9UC-ukQXSeCdRCW2sAHUjvlYPUOW40gIDaCPAuZTKtFYBzVte2VjXXYkoeT7vb0P_tMA7Vpoke29Z22O9ixZViQoBJxag-nFQf-hgDLqttaDY2HCrOquNN1fkm8Q8J_Ffb</recordid><startdate>20150901</startdate><enddate>20150901</enddate><creator>Samavatian, Mohammad Hossein</creator><creator>Arjomand, Mohammad</creator><creator>Bashizade, Ramin</creator><creator>Sarbazi-Azad, Hamid</creator><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20150901</creationdate><title>Architecting the Last-Level Cache for GPUs using STT-RAM Technology</title><author>Samavatian, Mohammad Hossein ; Arjomand, Mohammad ; Bashizade, Ramin ; Sarbazi-Azad, Hamid</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-83ab59eebc03e838aa94eb45cc7b42bb1984434d934cb2052d744bba8dad7d183</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Architecture</topic><topic>Automation</topic><topic>Delay</topic><topic>High density</topic><topic>Interprocessor communication</topic><topic>Power consumption</topic><topic>Searching</topic><topic>Thresholds</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Samavatian, Mohammad Hossein</creatorcontrib><creatorcontrib>Arjomand, Mohammad</creatorcontrib><creatorcontrib>Bashizade, Ramin</creatorcontrib><creatorcontrib>Sarbazi-Azad, Hamid</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM transactions on design automation of electronic systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Samavatian, Mohammad Hossein</au><au>Arjomand, Mohammad</au><au>Bashizade, Ramin</au><au>Sarbazi-Azad, Hamid</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Architecting the Last-Level Cache for GPUs using STT-RAM Technology</atitle><jtitle>ACM transactions on design automation of electronic systems</jtitle><date>2015-09-01</date><risdate>2015</risdate><volume>20</volume><issue>4</issue><spage>1</spage><epage>24</epage><pages>1-24</pages><issn>1084-4309</issn><eissn>1557-7309</eissn><abstract>Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limited area for on-chip memory banks. They have, however, two important issues, high energy and latency of write operations, that have to be addressed. Low retention time STT-RAMs can reduce the energy and delay of write operations. Nevertheless, employing STT-RAMs with low retention time in GPUs requires a thorough study on the behavior of GPGPU applications. Based on this investigation, we have architectured a two-part STT-RAM-based L2 cache with low-retention (LR) and high-retention (HR) parts. The proposed two-part L2 cache exploits a dynamic threshold regulator (DTR) to efficiently regulate the write threshold for migration of the data blocks from HR to LR, based on the behavior of the applications. Also, a Data and Access type Aware Cache Search mechanism (DAACS) is hired for handling the search of the requested data blocks in two parts of the cache. The STT-RAM L2 cache architecture proposed in this article can improve IPC by up to 171% (20% on average), and reduce the average consumed power by 28.9% compared to a conventional L2 cache architecture with equal on-chip area.</abstract><doi>10.1145/2764905</doi><tpages>24</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1084-4309
ispartof ACM transactions on design automation of electronic systems, 2015-09, Vol.20 (4), p.1-24
issn 1084-4309
1557-7309
language eng
recordid cdi_proquest_miscellaneous_1770334923
source ACM Digital Library Complete
subjects Architecture
Automation
Delay
High density
Interprocessor communication
Power consumption
Searching
Thresholds
title Architecting the Last-Level Cache for GPUs using STT-RAM Technology
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T00%3A19%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Architecting%20the%20Last-Level%20Cache%20for%20GPUs%20using%20STT-RAM%20Technology&rft.jtitle=ACM%20transactions%20on%20design%20automation%20of%20electronic%20systems&rft.au=Samavatian,%20Mohammad%20Hossein&rft.date=2015-09-01&rft.volume=20&rft.issue=4&rft.spage=1&rft.epage=24&rft.pages=1-24&rft.issn=1084-4309&rft.eissn=1557-7309&rft_id=info:doi/10.1145/2764905&rft_dat=%3Cproquest_cross%3E1770334923%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1770334923&rft_id=info:pmid/&rfr_iscdi=true