Exploring Modern GPU Memory System Design Challenges through Accurate Modeling

This paper explores the impact of simulator accuracy on architecture design decisions in the general-purpose graphics processing unit (GPGPU) space. We perform a detailed, quantitative analysis of the most popular publicly available GPU simulator, GPGPU-Sim, against our enhanced version of the simul...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2018-10
Hauptverfasser: Khairy, Mahmoud, Jain Akshay, Aamodt, Tor, Rogers, Timothy G
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Khairy, Mahmoud
Jain Akshay
Aamodt, Tor
Rogers, Timothy G
description This paper explores the impact of simulator accuracy on architecture design decisions in the general-purpose graphics processing unit (GPGPU) space. We perform a detailed, quantitative analysis of the most popular publicly available GPU simulator, GPGPU-Sim, against our enhanced version of the simulator, updated to model the memory system of modern GPUs in more detail. Our enhanced GPU model is able to describe the NVIDIA Volta architecture in sufficient detail to reduce error in memory system even counters by as much as 66X. The reduced error in the memory system further reduces execution time error versus real hardware by 2.5X. To demonstrate the accuracy of our enhanced model against a real machine, we perform a counter-by-counter validation against an NVIDIA TITAN V Volta GPU, demonstrating the relative accuracy of the new simulator versus the publicly available model. We go on to demonstrate that the simpler model discounts the importance of advanced memory system designs such as out-of-order memory access scheduling, while overstating the impact of more heavily researched areas like L1 cache bypassing. Our results demonstrate that it is important for the academic community to enhance the level of detail in architecture simulators as system complexity continues to grow. As part of this detailed correlation and modeling effort, we developed a new Correlator toolset that includes a consolidation of applications from a variety of popular GPGPU benchmark suites, designed to run in reasonable simulation times. The Correlator also includes a database of hardware profiling results for all these applications on NVIDIA cards ranging from Fermi to Volta and a toolchain that enables users to gather correlation statistics and create detailed counter-by-counter hardware correlation plots with minimal effort.
doi_str_mv 10.48550/arxiv.1810.07269
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_1810_07269</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409420848</sourcerecordid><originalsourceid>FETCH-LOGICAL-a528-2aadf68ce241cccca5e85801b5baaf98eae0655137da550d1e0320d6af88a5173</originalsourceid><addsrcrecordid>eNotj0tLw0AUhQdBsNT-AFcOuE6dRyaZLEusVWhVsK7DbXLzKEkmziTS_Htj690cuJxzOB8hd5wtfa0UewR7qn6WXE8PFooguiIzISX3tC_EDVk4d2SMiSAUSskZeVufutrYqi3ozmRoW7r5-KI7bIwd6efoemzoE7qqaGlcQl1jW6CjfWnNUJR0laaDhR7P2XoquSXXOdQOF_86J_vn9T5-8bbvm9d4tfVACe0JgCwPdIrC5-l0oFArzfhBHQDySCMgC5TiMsxgQso4MilYFkCuNSgeyjm5v9SeYZPOVg3YMfmDTs7Qk-Ph4uis-R7Q9cnRDLadNiXCZ5EvmPa1_AWkk1o1</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2409420848</pqid></control><display><type>article</type><title>Exploring Modern GPU Memory System Design Challenges through Accurate Modeling</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Khairy, Mahmoud ; Jain Akshay ; Aamodt, Tor ; Rogers, Timothy G</creator><creatorcontrib>Khairy, Mahmoud ; Jain Akshay ; Aamodt, Tor ; Rogers, Timothy G</creatorcontrib><description>This paper explores the impact of simulator accuracy on architecture design decisions in the general-purpose graphics processing unit (GPGPU) space. We perform a detailed, quantitative analysis of the most popular publicly available GPU simulator, GPGPU-Sim, against our enhanced version of the simulator, updated to model the memory system of modern GPUs in more detail. Our enhanced GPU model is able to describe the NVIDIA Volta architecture in sufficient detail to reduce error in memory system even counters by as much as 66X. The reduced error in the memory system further reduces execution time error versus real hardware by 2.5X. To demonstrate the accuracy of our enhanced model against a real machine, we perform a counter-by-counter validation against an NVIDIA TITAN V Volta GPU, demonstrating the relative accuracy of the new simulator versus the publicly available model. We go on to demonstrate that the simpler model discounts the importance of advanced memory system designs such as out-of-order memory access scheduling, while overstating the impact of more heavily researched areas like L1 cache bypassing. Our results demonstrate that it is important for the academic community to enhance the level of detail in architecture simulators as system complexity continues to grow. As part of this detailed correlation and modeling effort, we developed a new Correlator toolset that includes a consolidation of applications from a variety of popular GPGPU benchmark suites, designed to run in reasonable simulation times. The Correlator also includes a database of hardware profiling results for all these applications on NVIDIA cards ranging from Fermi to Volta and a toolchain that enables users to gather correlation statistics and create detailed counter-by-counter hardware correlation plots with minimal effort.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.1810.07269</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Accuracy ; Computer architecture ; Computer Science - Hardware Architecture ; Computer simulation ; Correlation ; Design ; Discounts ; Error reduction ; Flight simulators ; Graphics processing units ; Hardware ; Model accuracy ; Modelling ; Systems design</subject><ispartof>arXiv.org, 2018-10</ispartof><rights>2018. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27924</link.rule.ids><backlink>$$Uhttps://doi.org/10.1109/ISCA45697.2020.00047$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.1810.07269$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Khairy, Mahmoud</creatorcontrib><creatorcontrib>Jain Akshay</creatorcontrib><creatorcontrib>Aamodt, Tor</creatorcontrib><creatorcontrib>Rogers, Timothy G</creatorcontrib><title>Exploring Modern GPU Memory System Design Challenges through Accurate Modeling</title><title>arXiv.org</title><description>This paper explores the impact of simulator accuracy on architecture design decisions in the general-purpose graphics processing unit (GPGPU) space. We perform a detailed, quantitative analysis of the most popular publicly available GPU simulator, GPGPU-Sim, against our enhanced version of the simulator, updated to model the memory system of modern GPUs in more detail. Our enhanced GPU model is able to describe the NVIDIA Volta architecture in sufficient detail to reduce error in memory system even counters by as much as 66X. The reduced error in the memory system further reduces execution time error versus real hardware by 2.5X. To demonstrate the accuracy of our enhanced model against a real machine, we perform a counter-by-counter validation against an NVIDIA TITAN V Volta GPU, demonstrating the relative accuracy of the new simulator versus the publicly available model. We go on to demonstrate that the simpler model discounts the importance of advanced memory system designs such as out-of-order memory access scheduling, while overstating the impact of more heavily researched areas like L1 cache bypassing. Our results demonstrate that it is important for the academic community to enhance the level of detail in architecture simulators as system complexity continues to grow. As part of this detailed correlation and modeling effort, we developed a new Correlator toolset that includes a consolidation of applications from a variety of popular GPGPU benchmark suites, designed to run in reasonable simulation times. The Correlator also includes a database of hardware profiling results for all these applications on NVIDIA cards ranging from Fermi to Volta and a toolchain that enables users to gather correlation statistics and create detailed counter-by-counter hardware correlation plots with minimal effort.</description><subject>Accuracy</subject><subject>Computer architecture</subject><subject>Computer Science - Hardware Architecture</subject><subject>Computer simulation</subject><subject>Correlation</subject><subject>Design</subject><subject>Discounts</subject><subject>Error reduction</subject><subject>Flight simulators</subject><subject>Graphics processing units</subject><subject>Hardware</subject><subject>Model accuracy</subject><subject>Modelling</subject><subject>Systems design</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotj0tLw0AUhQdBsNT-AFcOuE6dRyaZLEusVWhVsK7DbXLzKEkmziTS_Htj690cuJxzOB8hd5wtfa0UewR7qn6WXE8PFooguiIzISX3tC_EDVk4d2SMiSAUSskZeVufutrYqi3ozmRoW7r5-KI7bIwd6efoemzoE7qqaGlcQl1jW6CjfWnNUJR0laaDhR7P2XoquSXXOdQOF_86J_vn9T5-8bbvm9d4tfVACe0JgCwPdIrC5-l0oFArzfhBHQDySCMgC5TiMsxgQso4MilYFkCuNSgeyjm5v9SeYZPOVg3YMfmDTs7Qk-Ph4uis-R7Q9cnRDLadNiXCZ5EvmPa1_AWkk1o1</recordid><startdate>20181016</startdate><enddate>20181016</enddate><creator>Khairy, Mahmoud</creator><creator>Jain Akshay</creator><creator>Aamodt, Tor</creator><creator>Rogers, Timothy G</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20181016</creationdate><title>Exploring Modern GPU Memory System Design Challenges through Accurate Modeling</title><author>Khairy, Mahmoud ; Jain Akshay ; Aamodt, Tor ; Rogers, Timothy G</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a528-2aadf68ce241cccca5e85801b5baaf98eae0655137da550d1e0320d6af88a5173</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Accuracy</topic><topic>Computer architecture</topic><topic>Computer Science - Hardware Architecture</topic><topic>Computer simulation</topic><topic>Correlation</topic><topic>Design</topic><topic>Discounts</topic><topic>Error reduction</topic><topic>Flight simulators</topic><topic>Graphics processing units</topic><topic>Hardware</topic><topic>Model accuracy</topic><topic>Modelling</topic><topic>Systems design</topic><toplevel>online_resources</toplevel><creatorcontrib>Khairy, Mahmoud</creatorcontrib><creatorcontrib>Jain Akshay</creatorcontrib><creatorcontrib>Aamodt, Tor</creatorcontrib><creatorcontrib>Rogers, Timothy G</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Khairy, Mahmoud</au><au>Jain Akshay</au><au>Aamodt, Tor</au><au>Rogers, Timothy G</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploring Modern GPU Memory System Design Challenges through Accurate Modeling</atitle><jtitle>arXiv.org</jtitle><date>2018-10-16</date><risdate>2018</risdate><eissn>2331-8422</eissn><abstract>This paper explores the impact of simulator accuracy on architecture design decisions in the general-purpose graphics processing unit (GPGPU) space. We perform a detailed, quantitative analysis of the most popular publicly available GPU simulator, GPGPU-Sim, against our enhanced version of the simulator, updated to model the memory system of modern GPUs in more detail. Our enhanced GPU model is able to describe the NVIDIA Volta architecture in sufficient detail to reduce error in memory system even counters by as much as 66X. The reduced error in the memory system further reduces execution time error versus real hardware by 2.5X. To demonstrate the accuracy of our enhanced model against a real machine, we perform a counter-by-counter validation against an NVIDIA TITAN V Volta GPU, demonstrating the relative accuracy of the new simulator versus the publicly available model. We go on to demonstrate that the simpler model discounts the importance of advanced memory system designs such as out-of-order memory access scheduling, while overstating the impact of more heavily researched areas like L1 cache bypassing. Our results demonstrate that it is important for the academic community to enhance the level of detail in architecture simulators as system complexity continues to grow. As part of this detailed correlation and modeling effort, we developed a new Correlator toolset that includes a consolidation of applications from a variety of popular GPGPU benchmark suites, designed to run in reasonable simulation times. The Correlator also includes a database of hardware profiling results for all these applications on NVIDIA cards ranging from Fermi to Volta and a toolchain that enables users to gather correlation statistics and create detailed counter-by-counter hardware correlation plots with minimal effort.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.1810.07269</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2018-10
issn 2331-8422
language eng
recordid cdi_arxiv_primary_1810_07269
source arXiv.org; Free E- Journals
subjects Accuracy
Computer architecture
Computer Science - Hardware Architecture
Computer simulation
Correlation
Design
Discounts
Error reduction
Flight simulators
Graphics processing units
Hardware
Model accuracy
Modelling
Systems design
title Exploring Modern GPU Memory System Design Challenges through Accurate Modeling
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T06%3A34%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploring%20Modern%20GPU%20Memory%20System%20Design%20Challenges%20through%20Accurate%20Modeling&rft.jtitle=arXiv.org&rft.au=Khairy,%20Mahmoud&rft.date=2018-10-16&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.1810.07269&rft_dat=%3Cproquest_arxiv%3E2409420848%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2409420848&rft_id=info:pmid/&rfr_iscdi=true