Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems

Scientists are increasingly exploring and utilizing the massive parallelism of general-purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters, hyperscalers, national computing centers, and supercomputers have procured hardware to support this evolving application pa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Sinha, Prasoon, Guliani, Akhil, Jain, Rutwik, Tran, Brandon, Sinclair, Matthew D, Venkataraman, Shivaram
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Sinha, Prasoon
Guliani, Akhil
Jain, Rutwik
Tran, Brandon
Sinclair, Matthew D
Venkataraman, Shivaram
description Scientists are increasingly exploring and utilizing the massive parallelism of general-purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters, hyperscalers, national computing centers, and supercomputers have procured hardware to support this evolving application paradigm. These systems contain hundreds to tens of thousands of accelerators, enabling peta- and exa-scale levels of compute for scientific workloads. Recent work demonstrated that power management (PM) can impact application performance in CPU-based HPC systems, even when machines have the same architecture and SKU (stock keeping unit). This variation occurs due to manufacturing variability and the chip's PM. However, while modern HPC systems widely employ accelerators such as GPUs, it is unclear how much this variability affects applications. Accordingly, we seek to characterize the extent of variation due to GPU PM in modern HPC and supercomputing systems. We study a variety of applications that stress different GPU components on five large-scale computing centers with modern GPUs: Oak Ridge's Summit, Sandia's Vortex, TACC's Frontera and Longhorn, and Livermore's Corona. These clusters use a variety of cooling methods and GPU vendors. In total, we collect over 18,800 hours of data across more than 90% of the GPUs in these clusters. Regardless of the application, cluster, GPU vendor, and cooling method, our results show significant variation: 8% (max 22%) average performance variation even though the GPU architecture and vendor SKU are identical within each cluster, with outliers up to 1.5X slower than the median GPU. These results highlight the difficulty in efficiently using existing GPU clusters for modern HPC and scientific workloads, and the need to embrace variability in future accelerator-based systems.
doi_str_mv 10.48550/arxiv.2208.11035
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2208_11035</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2208_11035</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-1c9fc1a82003722d62999028ebafc80277a600b78ea4f0346f007c689ac4d6f73</originalsourceid><addsrcrecordid>eNotz71OwzAUQGEvDKjwAEz4AUi4cRLbYYuiUpAiQP1hjW6c69aS24JjEOHpEYXpbEf6GLvKIC10WcIthi_3mQoBOs0yyMtzZp6Okdfe88XLZuR1IN4EwkgDn79_oL_jzQ4DmkjBfbvDlr9icNg77-LE3YG3GLaUrAx6uuG1MeQpYDyGZOnMjq-mMdJ-vGBnFv1Il_-dsfX9fN08JO3z4rGp2wSlKpPMVNZkqAVAroQYpKiqCoSmHq3RIJRCCdArTVhYyAtpAZSRukJTDNKqfMau_7YnZfcW3B7D1P1qu5M2_wGg7E6W</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems</title><source>arXiv.org</source><creator>Sinha, Prasoon ; Guliani, Akhil ; Jain, Rutwik ; Tran, Brandon ; Sinclair, Matthew D ; Venkataraman, Shivaram</creator><creatorcontrib>Sinha, Prasoon ; Guliani, Akhil ; Jain, Rutwik ; Tran, Brandon ; Sinclair, Matthew D ; Venkataraman, Shivaram</creatorcontrib><description>Scientists are increasingly exploring and utilizing the massive parallelism of general-purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters, hyperscalers, national computing centers, and supercomputers have procured hardware to support this evolving application paradigm. These systems contain hundreds to tens of thousands of accelerators, enabling peta- and exa-scale levels of compute for scientific workloads. Recent work demonstrated that power management (PM) can impact application performance in CPU-based HPC systems, even when machines have the same architecture and SKU (stock keeping unit). This variation occurs due to manufacturing variability and the chip's PM. However, while modern HPC systems widely employ accelerators such as GPUs, it is unclear how much this variability affects applications. Accordingly, we seek to characterize the extent of variation due to GPU PM in modern HPC and supercomputing systems. We study a variety of applications that stress different GPU components on five large-scale computing centers with modern GPUs: Oak Ridge's Summit, Sandia's Vortex, TACC's Frontera and Longhorn, and Livermore's Corona. These clusters use a variety of cooling methods and GPU vendors. In total, we collect over 18,800 hours of data across more than 90% of the GPUs in these clusters. Regardless of the application, cluster, GPU vendor, and cooling method, our results show significant variation: 8% (max 22%) average performance variation even though the GPU architecture and vendor SKU are identical within each cluster, with outliers up to 1.5X slower than the median GPU. These results highlight the difficulty in efficiently using existing GPU clusters for modern HPC and scientific workloads, and the need to embrace variability in future accelerator-based systems.</description><identifier>DOI: 10.48550/arxiv.2208.11035</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><creationdate>2022-08</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2208.11035$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2208.11035$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Sinha, Prasoon</creatorcontrib><creatorcontrib>Guliani, Akhil</creatorcontrib><creatorcontrib>Jain, Rutwik</creatorcontrib><creatorcontrib>Tran, Brandon</creatorcontrib><creatorcontrib>Sinclair, Matthew D</creatorcontrib><creatorcontrib>Venkataraman, Shivaram</creatorcontrib><title>Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems</title><description>Scientists are increasingly exploring and utilizing the massive parallelism of general-purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters, hyperscalers, national computing centers, and supercomputers have procured hardware to support this evolving application paradigm. These systems contain hundreds to tens of thousands of accelerators, enabling peta- and exa-scale levels of compute for scientific workloads. Recent work demonstrated that power management (PM) can impact application performance in CPU-based HPC systems, even when machines have the same architecture and SKU (stock keeping unit). This variation occurs due to manufacturing variability and the chip's PM. However, while modern HPC systems widely employ accelerators such as GPUs, it is unclear how much this variability affects applications. Accordingly, we seek to characterize the extent of variation due to GPU PM in modern HPC and supercomputing systems. We study a variety of applications that stress different GPU components on five large-scale computing centers with modern GPUs: Oak Ridge's Summit, Sandia's Vortex, TACC's Frontera and Longhorn, and Livermore's Corona. These clusters use a variety of cooling methods and GPU vendors. In total, we collect over 18,800 hours of data across more than 90% of the GPUs in these clusters. Regardless of the application, cluster, GPU vendor, and cooling method, our results show significant variation: 8% (max 22%) average performance variation even though the GPU architecture and vendor SKU are identical within each cluster, with outliers up to 1.5X slower than the median GPU. These results highlight the difficulty in efficiently using existing GPU clusters for modern HPC and scientific workloads, and the need to embrace variability in future accelerator-based systems.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUQGEvDKjwAEz4AUi4cRLbYYuiUpAiQP1hjW6c69aS24JjEOHpEYXpbEf6GLvKIC10WcIthi_3mQoBOs0yyMtzZp6Okdfe88XLZuR1IN4EwkgDn79_oL_jzQ4DmkjBfbvDlr9icNg77-LE3YG3GLaUrAx6uuG1MeQpYDyGZOnMjq-mMdJ-vGBnFv1Il_-dsfX9fN08JO3z4rGp2wSlKpPMVNZkqAVAroQYpKiqCoSmHq3RIJRCCdArTVhYyAtpAZSRukJTDNKqfMau_7YnZfcW3B7D1P1qu5M2_wGg7E6W</recordid><startdate>20220823</startdate><enddate>20220823</enddate><creator>Sinha, Prasoon</creator><creator>Guliani, Akhil</creator><creator>Jain, Rutwik</creator><creator>Tran, Brandon</creator><creator>Sinclair, Matthew D</creator><creator>Venkataraman, Shivaram</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220823</creationdate><title>Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems</title><author>Sinha, Prasoon ; Guliani, Akhil ; Jain, Rutwik ; Tran, Brandon ; Sinclair, Matthew D ; Venkataraman, Shivaram</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-1c9fc1a82003722d62999028ebafc80277a600b78ea4f0346f007c689ac4d6f73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Sinha, Prasoon</creatorcontrib><creatorcontrib>Guliani, Akhil</creatorcontrib><creatorcontrib>Jain, Rutwik</creatorcontrib><creatorcontrib>Tran, Brandon</creatorcontrib><creatorcontrib>Sinclair, Matthew D</creatorcontrib><creatorcontrib>Venkataraman, Shivaram</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sinha, Prasoon</au><au>Guliani, Akhil</au><au>Jain, Rutwik</au><au>Tran, Brandon</au><au>Sinclair, Matthew D</au><au>Venkataraman, Shivaram</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems</atitle><date>2022-08-23</date><risdate>2022</risdate><abstract>Scientists are increasingly exploring and utilizing the massive parallelism of general-purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters, hyperscalers, national computing centers, and supercomputers have procured hardware to support this evolving application paradigm. These systems contain hundreds to tens of thousands of accelerators, enabling peta- and exa-scale levels of compute for scientific workloads. Recent work demonstrated that power management (PM) can impact application performance in CPU-based HPC systems, even when machines have the same architecture and SKU (stock keeping unit). This variation occurs due to manufacturing variability and the chip's PM. However, while modern HPC systems widely employ accelerators such as GPUs, it is unclear how much this variability affects applications. Accordingly, we seek to characterize the extent of variation due to GPU PM in modern HPC and supercomputing systems. We study a variety of applications that stress different GPU components on five large-scale computing centers with modern GPUs: Oak Ridge's Summit, Sandia's Vortex, TACC's Frontera and Longhorn, and Livermore's Corona. These clusters use a variety of cooling methods and GPU vendors. In total, we collect over 18,800 hours of data across more than 90% of the GPUs in these clusters. Regardless of the application, cluster, GPU vendor, and cooling method, our results show significant variation: 8% (max 22%) average performance variation even though the GPU architecture and vendor SKU are identical within each cluster, with outliers up to 1.5X slower than the median GPU. These results highlight the difficulty in efficiently using existing GPU clusters for modern HPC and scientific workloads, and the need to embrace variability in future accelerator-based systems.</abstract><doi>10.48550/arxiv.2208.11035</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2208.11035
ispartof
issn
language eng
recordid cdi_arxiv_primary_2208_11035
source arXiv.org
subjects Computer Science - Distributed, Parallel, and Cluster Computing
title Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T04%3A43%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Not%20All%20GPUs%20Are%20Created%20Equal:%20Characterizing%20Variability%20in%20Large-Scale,%20Accelerator-Rich%20Systems&rft.au=Sinha,%20Prasoon&rft.date=2022-08-23&rft_id=info:doi/10.48550/arxiv.2208.11035&rft_dat=%3Carxiv_GOX%3E2208_11035%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true