GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance

Cycles Per Instruction (CPI) stacks help computer architects gain insight into the performance of their target architectures and applications. To bring the benefits of CPI stacks to Graphics Processing Units (GPUs), prior studies have proposed GPU cycle accounting mechanisms that can identify the st...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE computer architecture letters 2024-07, Vol.23 (2), p.235-238
Hauptverfasser: Cha, Hanna, Lee, Sungchul, Ha, Yeonan, Jang, Hanhwi, Kim, Joonsung, Kim, Youngsok
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 238
container_issue 2
container_start_page 235
container_title IEEE computer architecture letters
container_volume 23
creator Cha, Hanna
Lee, Sungchul
Ha, Yeonan
Jang, Hanhwi
Kim, Joonsung
Kim, Youngsok
description Cycles Per Instruction (CPI) stacks help computer architects gain insight into the performance of their target architectures and applications. To bring the benefits of CPI stacks to Graphics Processing Units (GPUs), prior studies have proposed GPU cycle accounting mechanisms that can identify the stall cycles and their stall events on GPU architectures. Unfortunately, the prior studies cannot provide accurate insight into the GPU performance due to their coarse-grained, priority-driven, and issue-centric cycle accounting mechanisms. In this letter, we present GCStack , a fine-grained GPU cycle accounting mechanism that constructs accurate CPI stacks and accurately identifies primary GPU performance bottlenecks. GCStack first exposes all the stall events of the outstanding warps of a warp scheduler, most of which get hidden by the existing mechanisms. Then, GCStack defers the classification of structural stalls, which the existing mechanisms cannot correctly identify with their issue-stage-centric stall classification, to the later stages of the GPU pipeline. We implement GCStack on Accel-Sim and show that GCStack provides more accurate CPI stacks and GPU performance insight than GSI, the state-of-the-art GPU cycle accounting mechanism whose primary focus is on characterizing memory-related stalls.
doi_str_mv 10.1109/LCA.2024.3476909
format Article
fullrecord <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10711248</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10711248</ieee_id><sourcerecordid>10_1109_LCA_2024_3476909</sourcerecordid><originalsourceid>FETCH-LOGICAL-c147t-f915cb908296b105802ac3da9f6264e5bba8912786b592e50c9b63efb2687e033</originalsourceid><addsrcrecordid>eNpNkE1Lw0AURQdRsFb3LlzMH0h985kZdyFoLVQsaJcSZqYvbbRNZJIW-u9NbRFX9_E49y4OIbcMRoyBvZ_m2YgDlyMhU23BnpEBU0onGrQ8_7uVviRXbfsJILUwckA-xvlb58LXA83oeDan-T6skWYhNNu6q-olfcGwcnXVbmjZRDqLza5aHP49so2uQzqp22q56vrsmt-JGcYe3bg64DW5KN26xZtTDsn86fE9f06mr-NJnk2TwGTaJaVlKngLhlvtGSgD3AWxcLbUXEtU3jtjGU-N9spyVBCs1wJLz7VJEYQYEjjuhti0bcSy-I7VxsV9waA46Cl6PcVBT3HS01fujpUKEf_hKWNcGvEDe8JfzA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance</title><source>IEEE Electronic Library (IEL)</source><creator>Cha, Hanna ; Lee, Sungchul ; Ha, Yeonan ; Jang, Hanhwi ; Kim, Joonsung ; Kim, Youngsok</creator><creatorcontrib>Cha, Hanna ; Lee, Sungchul ; Ha, Yeonan ; Jang, Hanhwi ; Kim, Joonsung ; Kim, Youngsok</creatorcontrib><description>Cycles Per Instruction (CPI) stacks help computer architects gain insight into the performance of their target architectures and applications. To bring the benefits of CPI stacks to Graphics Processing Units (GPUs), prior studies have proposed GPU cycle accounting mechanisms that can identify the stall cycles and their stall events on GPU architectures. Unfortunately, the prior studies cannot provide accurate insight into the GPU performance due to their coarse-grained, priority-driven, and issue-centric cycle accounting mechanisms. In this letter, we present GCStack , a fine-grained GPU cycle accounting mechanism that constructs accurate CPI stacks and accurately identifies primary GPU performance bottlenecks. GCStack first exposes all the stall events of the outstanding warps of a warp scheduler, most of which get hidden by the existing mechanisms. Then, GCStack defers the classification of structural stalls, which the existing mechanisms cannot correctly identify with their issue-stage-centric stall classification, to the later stages of the GPU pipeline. We implement GCStack on Accel-Sim and show that GCStack provides more accurate CPI stacks and GPU performance insight than GSI, the state-of-the-art GPU cycle accounting mechanism whose primary focus is on characterizing memory-related stalls.</description><identifier>ISSN: 1556-6056</identifier><identifier>EISSN: 1556-6064</identifier><identifier>DOI: 10.1109/LCA.2024.3476909</identifier><identifier>CODEN: ICALC3</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Computer architecture ; CPI stack ; cycle accounting ; Degradation ; GPU ; Graphics processing units ; Hazards ; Instruction sets ; Micromechanical devices ; Pipelines ; Synchronization ; Tensors</subject><ispartof>IEEE computer architecture letters, 2024-07, Vol.23 (2), p.235-238</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c147t-f915cb908296b105802ac3da9f6264e5bba8912786b592e50c9b63efb2687e033</cites><orcidid>0000-0002-5432-7813 ; 0009-0003-5937-8550 ; 0009-0009-5549-7265 ; 0000-0002-3418-5299 ; 0000-0002-1015-9969 ; 0000-0003-3722-4131</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10711248$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10711248$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Cha, Hanna</creatorcontrib><creatorcontrib>Lee, Sungchul</creatorcontrib><creatorcontrib>Ha, Yeonan</creatorcontrib><creatorcontrib>Jang, Hanhwi</creatorcontrib><creatorcontrib>Kim, Joonsung</creatorcontrib><creatorcontrib>Kim, Youngsok</creatorcontrib><title>GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance</title><title>IEEE computer architecture letters</title><addtitle>LCA</addtitle><description>Cycles Per Instruction (CPI) stacks help computer architects gain insight into the performance of their target architectures and applications. To bring the benefits of CPI stacks to Graphics Processing Units (GPUs), prior studies have proposed GPU cycle accounting mechanisms that can identify the stall cycles and their stall events on GPU architectures. Unfortunately, the prior studies cannot provide accurate insight into the GPU performance due to their coarse-grained, priority-driven, and issue-centric cycle accounting mechanisms. In this letter, we present GCStack , a fine-grained GPU cycle accounting mechanism that constructs accurate CPI stacks and accurately identifies primary GPU performance bottlenecks. GCStack first exposes all the stall events of the outstanding warps of a warp scheduler, most of which get hidden by the existing mechanisms. Then, GCStack defers the classification of structural stalls, which the existing mechanisms cannot correctly identify with their issue-stage-centric stall classification, to the later stages of the GPU pipeline. We implement GCStack on Accel-Sim and show that GCStack provides more accurate CPI stacks and GPU performance insight than GSI, the state-of-the-art GPU cycle accounting mechanism whose primary focus is on characterizing memory-related stalls.</description><subject>Accuracy</subject><subject>Computer architecture</subject><subject>CPI stack</subject><subject>cycle accounting</subject><subject>Degradation</subject><subject>GPU</subject><subject>Graphics processing units</subject><subject>Hazards</subject><subject>Instruction sets</subject><subject>Micromechanical devices</subject><subject>Pipelines</subject><subject>Synchronization</subject><subject>Tensors</subject><issn>1556-6056</issn><issn>1556-6064</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1Lw0AURQdRsFb3LlzMH0h985kZdyFoLVQsaJcSZqYvbbRNZJIW-u9NbRFX9_E49y4OIbcMRoyBvZ_m2YgDlyMhU23BnpEBU0onGrQ8_7uVviRXbfsJILUwckA-xvlb58LXA83oeDan-T6skWYhNNu6q-olfcGwcnXVbmjZRDqLza5aHP49so2uQzqp22q56vrsmt-JGcYe3bg64DW5KN26xZtTDsn86fE9f06mr-NJnk2TwGTaJaVlKngLhlvtGSgD3AWxcLbUXEtU3jtjGU-N9spyVBCs1wJLz7VJEYQYEjjuhti0bcSy-I7VxsV9waA46Cl6PcVBT3HS01fujpUKEf_hKWNcGvEDe8JfzA</recordid><startdate>202407</startdate><enddate>202407</enddate><creator>Cha, Hanna</creator><creator>Lee, Sungchul</creator><creator>Ha, Yeonan</creator><creator>Jang, Hanhwi</creator><creator>Kim, Joonsung</creator><creator>Kim, Youngsok</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-5432-7813</orcidid><orcidid>https://orcid.org/0009-0003-5937-8550</orcidid><orcidid>https://orcid.org/0009-0009-5549-7265</orcidid><orcidid>https://orcid.org/0000-0002-3418-5299</orcidid><orcidid>https://orcid.org/0000-0002-1015-9969</orcidid><orcidid>https://orcid.org/0000-0003-3722-4131</orcidid></search><sort><creationdate>202407</creationdate><title>GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance</title><author>Cha, Hanna ; Lee, Sungchul ; Ha, Yeonan ; Jang, Hanhwi ; Kim, Joonsung ; Kim, Youngsok</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c147t-f915cb908296b105802ac3da9f6264e5bba8912786b592e50c9b63efb2687e033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Computer architecture</topic><topic>CPI stack</topic><topic>cycle accounting</topic><topic>Degradation</topic><topic>GPU</topic><topic>Graphics processing units</topic><topic>Hazards</topic><topic>Instruction sets</topic><topic>Micromechanical devices</topic><topic>Pipelines</topic><topic>Synchronization</topic><topic>Tensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cha, Hanna</creatorcontrib><creatorcontrib>Lee, Sungchul</creatorcontrib><creatorcontrib>Ha, Yeonan</creatorcontrib><creatorcontrib>Jang, Hanhwi</creatorcontrib><creatorcontrib>Kim, Joonsung</creatorcontrib><creatorcontrib>Kim, Youngsok</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE computer architecture letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cha, Hanna</au><au>Lee, Sungchul</au><au>Ha, Yeonan</au><au>Jang, Hanhwi</au><au>Kim, Joonsung</au><au>Kim, Youngsok</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance</atitle><jtitle>IEEE computer architecture letters</jtitle><stitle>LCA</stitle><date>2024-07</date><risdate>2024</risdate><volume>23</volume><issue>2</issue><spage>235</spage><epage>238</epage><pages>235-238</pages><issn>1556-6056</issn><eissn>1556-6064</eissn><coden>ICALC3</coden><abstract>Cycles Per Instruction (CPI) stacks help computer architects gain insight into the performance of their target architectures and applications. To bring the benefits of CPI stacks to Graphics Processing Units (GPUs), prior studies have proposed GPU cycle accounting mechanisms that can identify the stall cycles and their stall events on GPU architectures. Unfortunately, the prior studies cannot provide accurate insight into the GPU performance due to their coarse-grained, priority-driven, and issue-centric cycle accounting mechanisms. In this letter, we present GCStack , a fine-grained GPU cycle accounting mechanism that constructs accurate CPI stacks and accurately identifies primary GPU performance bottlenecks. GCStack first exposes all the stall events of the outstanding warps of a warp scheduler, most of which get hidden by the existing mechanisms. Then, GCStack defers the classification of structural stalls, which the existing mechanisms cannot correctly identify with their issue-stage-centric stall classification, to the later stages of the GPU pipeline. We implement GCStack on Accel-Sim and show that GCStack provides more accurate CPI stacks and GPU performance insight than GSI, the state-of-the-art GPU cycle accounting mechanism whose primary focus is on characterizing memory-related stalls.</abstract><pub>IEEE</pub><doi>10.1109/LCA.2024.3476909</doi><tpages>4</tpages><orcidid>https://orcid.org/0000-0002-5432-7813</orcidid><orcidid>https://orcid.org/0009-0003-5937-8550</orcidid><orcidid>https://orcid.org/0009-0009-5549-7265</orcidid><orcidid>https://orcid.org/0000-0002-3418-5299</orcidid><orcidid>https://orcid.org/0000-0002-1015-9969</orcidid><orcidid>https://orcid.org/0000-0003-3722-4131</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1556-6056
ispartof IEEE computer architecture letters, 2024-07, Vol.23 (2), p.235-238
issn 1556-6056
1556-6064
language eng
recordid cdi_ieee_primary_10711248
source IEEE Electronic Library (IEL)
subjects Accuracy
Computer architecture
CPI stack
cycle accounting
Degradation
GPU
Graphics processing units
Hazards
Instruction sets
Micromechanical devices
Pipelines
Synchronization
Tensors
title GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T13%3A34%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GCStack:%20A%20GPU%20Cycle%20Accounting%20Mechanism%20for%20Providing%20Accurate%20Insight%20Into%20GPU%20Performance&rft.jtitle=IEEE%20computer%20architecture%20letters&rft.au=Cha,%20Hanna&rft.date=2024-07&rft.volume=23&rft.issue=2&rft.spage=235&rft.epage=238&rft.pages=235-238&rft.issn=1556-6056&rft.eissn=1556-6064&rft.coden=ICALC3&rft_id=info:doi/10.1109/LCA.2024.3476909&rft_dat=%3Ccrossref_RIE%3E10_1109_LCA_2024_3476909%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10711248&rfr_iscdi=true