GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance
Cycles Per Instruction (CPI) stacks help computer architects gain insight into the performance of their target architectures and applications. To bring the benefits of CPI stacks to Graphics Processing Units (GPUs), prior studies have proposed GPU cycle accounting mechanisms that can identify the st...
Gespeichert in:
Veröffentlicht in: | IEEE computer architecture letters 2024-07, Vol.23 (2), p.235-238 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 238 |
---|---|
container_issue | 2 |
container_start_page | 235 |
container_title | IEEE computer architecture letters |
container_volume | 23 |
creator | Cha, Hanna Lee, Sungchul Ha, Yeonan Jang, Hanhwi Kim, Joonsung Kim, Youngsok |
description | Cycles Per Instruction (CPI) stacks help computer architects gain insight into the performance of their target architectures and applications. To bring the benefits of CPI stacks to Graphics Processing Units (GPUs), prior studies have proposed GPU cycle accounting mechanisms that can identify the stall cycles and their stall events on GPU architectures. Unfortunately, the prior studies cannot provide accurate insight into the GPU performance due to their coarse-grained, priority-driven, and issue-centric cycle accounting mechanisms. In this letter, we present GCStack , a fine-grained GPU cycle accounting mechanism that constructs accurate CPI stacks and accurately identifies primary GPU performance bottlenecks. GCStack first exposes all the stall events of the outstanding warps of a warp scheduler, most of which get hidden by the existing mechanisms. Then, GCStack defers the classification of structural stalls, which the existing mechanisms cannot correctly identify with their issue-stage-centric stall classification, to the later stages of the GPU pipeline. We implement GCStack on Accel-Sim and show that GCStack provides more accurate CPI stacks and GPU performance insight than GSI, the state-of-the-art GPU cycle accounting mechanism whose primary focus is on characterizing memory-related stalls. |
doi_str_mv | 10.1109/LCA.2024.3476909 |
format | Article |
fullrecord | <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10711248</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10711248</ieee_id><sourcerecordid>10_1109_LCA_2024_3476909</sourcerecordid><originalsourceid>FETCH-LOGICAL-c147t-f915cb908296b105802ac3da9f6264e5bba8912786b592e50c9b63efb2687e033</originalsourceid><addsrcrecordid>eNpNkE1Lw0AURQdRsFb3LlzMH0h985kZdyFoLVQsaJcSZqYvbbRNZJIW-u9NbRFX9_E49y4OIbcMRoyBvZ_m2YgDlyMhU23BnpEBU0onGrQ8_7uVviRXbfsJILUwckA-xvlb58LXA83oeDan-T6skWYhNNu6q-olfcGwcnXVbmjZRDqLza5aHP49so2uQzqp22q56vrsmt-JGcYe3bg64DW5KN26xZtTDsn86fE9f06mr-NJnk2TwGTaJaVlKngLhlvtGSgD3AWxcLbUXEtU3jtjGU-N9spyVBCs1wJLz7VJEYQYEjjuhti0bcSy-I7VxsV9waA46Cl6PcVBT3HS01fujpUKEf_hKWNcGvEDe8JfzA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance</title><source>IEEE Electronic Library (IEL)</source><creator>Cha, Hanna ; Lee, Sungchul ; Ha, Yeonan ; Jang, Hanhwi ; Kim, Joonsung ; Kim, Youngsok</creator><creatorcontrib>Cha, Hanna ; Lee, Sungchul ; Ha, Yeonan ; Jang, Hanhwi ; Kim, Joonsung ; Kim, Youngsok</creatorcontrib><description>Cycles Per Instruction (CPI) stacks help computer architects gain insight into the performance of their target architectures and applications. To bring the benefits of CPI stacks to Graphics Processing Units (GPUs), prior studies have proposed GPU cycle accounting mechanisms that can identify the stall cycles and their stall events on GPU architectures. Unfortunately, the prior studies cannot provide accurate insight into the GPU performance due to their coarse-grained, priority-driven, and issue-centric cycle accounting mechanisms. In this letter, we present GCStack , a fine-grained GPU cycle accounting mechanism that constructs accurate CPI stacks and accurately identifies primary GPU performance bottlenecks. GCStack first exposes all the stall events of the outstanding warps of a warp scheduler, most of which get hidden by the existing mechanisms. Then, GCStack defers the classification of structural stalls, which the existing mechanisms cannot correctly identify with their issue-stage-centric stall classification, to the later stages of the GPU pipeline. We implement GCStack on Accel-Sim and show that GCStack provides more accurate CPI stacks and GPU performance insight than GSI, the state-of-the-art GPU cycle accounting mechanism whose primary focus is on characterizing memory-related stalls.</description><identifier>ISSN: 1556-6056</identifier><identifier>EISSN: 1556-6064</identifier><identifier>DOI: 10.1109/LCA.2024.3476909</identifier><identifier>CODEN: ICALC3</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Computer architecture ; CPI stack ; cycle accounting ; Degradation ; GPU ; Graphics processing units ; Hazards ; Instruction sets ; Micromechanical devices ; Pipelines ; Synchronization ; Tensors</subject><ispartof>IEEE computer architecture letters, 2024-07, Vol.23 (2), p.235-238</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c147t-f915cb908296b105802ac3da9f6264e5bba8912786b592e50c9b63efb2687e033</cites><orcidid>0000-0002-5432-7813 ; 0009-0003-5937-8550 ; 0009-0009-5549-7265 ; 0000-0002-3418-5299 ; 0000-0002-1015-9969 ; 0000-0003-3722-4131</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10711248$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10711248$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Cha, Hanna</creatorcontrib><creatorcontrib>Lee, Sungchul</creatorcontrib><creatorcontrib>Ha, Yeonan</creatorcontrib><creatorcontrib>Jang, Hanhwi</creatorcontrib><creatorcontrib>Kim, Joonsung</creatorcontrib><creatorcontrib>Kim, Youngsok</creatorcontrib><title>GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance</title><title>IEEE computer architecture letters</title><addtitle>LCA</addtitle><description>Cycles Per Instruction (CPI) stacks help computer architects gain insight into the performance of their target architectures and applications. To bring the benefits of CPI stacks to Graphics Processing Units (GPUs), prior studies have proposed GPU cycle accounting mechanisms that can identify the stall cycles and their stall events on GPU architectures. Unfortunately, the prior studies cannot provide accurate insight into the GPU performance due to their coarse-grained, priority-driven, and issue-centric cycle accounting mechanisms. In this letter, we present GCStack , a fine-grained GPU cycle accounting mechanism that constructs accurate CPI stacks and accurately identifies primary GPU performance bottlenecks. GCStack first exposes all the stall events of the outstanding warps of a warp scheduler, most of which get hidden by the existing mechanisms. Then, GCStack defers the classification of structural stalls, which the existing mechanisms cannot correctly identify with their issue-stage-centric stall classification, to the later stages of the GPU pipeline. We implement GCStack on Accel-Sim and show that GCStack provides more accurate CPI stacks and GPU performance insight than GSI, the state-of-the-art GPU cycle accounting mechanism whose primary focus is on characterizing memory-related stalls.</description><subject>Accuracy</subject><subject>Computer architecture</subject><subject>CPI stack</subject><subject>cycle accounting</subject><subject>Degradation</subject><subject>GPU</subject><subject>Graphics processing units</subject><subject>Hazards</subject><subject>Instruction sets</subject><subject>Micromechanical devices</subject><subject>Pipelines</subject><subject>Synchronization</subject><subject>Tensors</subject><issn>1556-6056</issn><issn>1556-6064</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1Lw0AURQdRsFb3LlzMH0h985kZdyFoLVQsaJcSZqYvbbRNZJIW-u9NbRFX9_E49y4OIbcMRoyBvZ_m2YgDlyMhU23BnpEBU0onGrQ8_7uVviRXbfsJILUwckA-xvlb58LXA83oeDan-T6skWYhNNu6q-olfcGwcnXVbmjZRDqLza5aHP49so2uQzqp22q56vrsmt-JGcYe3bg64DW5KN26xZtTDsn86fE9f06mr-NJnk2TwGTaJaVlKngLhlvtGSgD3AWxcLbUXEtU3jtjGU-N9spyVBCs1wJLz7VJEYQYEjjuhti0bcSy-I7VxsV9waA46Cl6PcVBT3HS01fujpUKEf_hKWNcGvEDe8JfzA</recordid><startdate>202407</startdate><enddate>202407</enddate><creator>Cha, Hanna</creator><creator>Lee, Sungchul</creator><creator>Ha, Yeonan</creator><creator>Jang, Hanhwi</creator><creator>Kim, Joonsung</creator><creator>Kim, Youngsok</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-5432-7813</orcidid><orcidid>https://orcid.org/0009-0003-5937-8550</orcidid><orcidid>https://orcid.org/0009-0009-5549-7265</orcidid><orcidid>https://orcid.org/0000-0002-3418-5299</orcidid><orcidid>https://orcid.org/0000-0002-1015-9969</orcidid><orcidid>https://orcid.org/0000-0003-3722-4131</orcidid></search><sort><creationdate>202407</creationdate><title>GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance</title><author>Cha, Hanna ; Lee, Sungchul ; Ha, Yeonan ; Jang, Hanhwi ; Kim, Joonsung ; Kim, Youngsok</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c147t-f915cb908296b105802ac3da9f6264e5bba8912786b592e50c9b63efb2687e033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Computer architecture</topic><topic>CPI stack</topic><topic>cycle accounting</topic><topic>Degradation</topic><topic>GPU</topic><topic>Graphics processing units</topic><topic>Hazards</topic><topic>Instruction sets</topic><topic>Micromechanical devices</topic><topic>Pipelines</topic><topic>Synchronization</topic><topic>Tensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cha, Hanna</creatorcontrib><creatorcontrib>Lee, Sungchul</creatorcontrib><creatorcontrib>Ha, Yeonan</creatorcontrib><creatorcontrib>Jang, Hanhwi</creatorcontrib><creatorcontrib>Kim, Joonsung</creatorcontrib><creatorcontrib>Kim, Youngsok</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE computer architecture letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cha, Hanna</au><au>Lee, Sungchul</au><au>Ha, Yeonan</au><au>Jang, Hanhwi</au><au>Kim, Joonsung</au><au>Kim, Youngsok</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance</atitle><jtitle>IEEE computer architecture letters</jtitle><stitle>LCA</stitle><date>2024-07</date><risdate>2024</risdate><volume>23</volume><issue>2</issue><spage>235</spage><epage>238</epage><pages>235-238</pages><issn>1556-6056</issn><eissn>1556-6064</eissn><coden>ICALC3</coden><abstract>Cycles Per Instruction (CPI) stacks help computer architects gain insight into the performance of their target architectures and applications. To bring the benefits of CPI stacks to Graphics Processing Units (GPUs), prior studies have proposed GPU cycle accounting mechanisms that can identify the stall cycles and their stall events on GPU architectures. Unfortunately, the prior studies cannot provide accurate insight into the GPU performance due to their coarse-grained, priority-driven, and issue-centric cycle accounting mechanisms. In this letter, we present GCStack , a fine-grained GPU cycle accounting mechanism that constructs accurate CPI stacks and accurately identifies primary GPU performance bottlenecks. GCStack first exposes all the stall events of the outstanding warps of a warp scheduler, most of which get hidden by the existing mechanisms. Then, GCStack defers the classification of structural stalls, which the existing mechanisms cannot correctly identify with their issue-stage-centric stall classification, to the later stages of the GPU pipeline. We implement GCStack on Accel-Sim and show that GCStack provides more accurate CPI stacks and GPU performance insight than GSI, the state-of-the-art GPU cycle accounting mechanism whose primary focus is on characterizing memory-related stalls.</abstract><pub>IEEE</pub><doi>10.1109/LCA.2024.3476909</doi><tpages>4</tpages><orcidid>https://orcid.org/0000-0002-5432-7813</orcidid><orcidid>https://orcid.org/0009-0003-5937-8550</orcidid><orcidid>https://orcid.org/0009-0009-5549-7265</orcidid><orcidid>https://orcid.org/0000-0002-3418-5299</orcidid><orcidid>https://orcid.org/0000-0002-1015-9969</orcidid><orcidid>https://orcid.org/0000-0003-3722-4131</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1556-6056 |
ispartof | IEEE computer architecture letters, 2024-07, Vol.23 (2), p.235-238 |
issn | 1556-6056 1556-6064 |
language | eng |
recordid | cdi_ieee_primary_10711248 |
source | IEEE Electronic Library (IEL) |
subjects | Accuracy Computer architecture CPI stack cycle accounting Degradation GPU Graphics processing units Hazards Instruction sets Micromechanical devices Pipelines Synchronization Tensors |
title | GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T13%3A34%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GCStack:%20A%20GPU%20Cycle%20Accounting%20Mechanism%20for%20Providing%20Accurate%20Insight%20Into%20GPU%20Performance&rft.jtitle=IEEE%20computer%20architecture%20letters&rft.au=Cha,%20Hanna&rft.date=2024-07&rft.volume=23&rft.issue=2&rft.spage=235&rft.epage=238&rft.pages=235-238&rft.issn=1556-6056&rft.eissn=1556-6064&rft.coden=ICALC3&rft_id=info:doi/10.1109/LCA.2024.3476909&rft_dat=%3Ccrossref_RIE%3E10_1109_LCA_2024_3476909%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10711248&rfr_iscdi=true |