An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload
Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 33 |
---|---|
container_issue | |
container_start_page | 26 |
container_title | |
container_volume | |
creator | Foglia, P. Panicucci, F. Prete, C.A. Solinas, M. |
description | Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses' distribution to the L2 cache that is not uniformly distributed to all the cache banks. |
doi_str_mv | 10.1109/DSD.2009.153 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5349985</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5349985</ieee_id><sourcerecordid>5349985</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-56987ebf6563ba3e243a42f2c0f89edd7ad086639a93209ca9c9e3573c74a0d73</originalsourceid><addsrcrecordid>eNotjF1LwzAYRgMyUGfvvPMmf6D1bdJ8vJe1q1OYH1iHlyNLE43WVNpu4L_Xoc_N4XDgIeQ8hyzPAS8XzSJjAJjlgh-RBJUGJVFwpZmYkdNDQpBYwDFJxvEdfieYRKlOSF1GWu9NtzNT6CPtPb1yb2Yf-mE8SJPer6uSVnePI33axRjiK21scHEKPlj60g8fXW_aMzLzphtd8s85WV_Xz9VNunpY3lblKg25ElMqJGrltl4KybeGO1ZwUzDPLHiNrm2VaUFLydEgZ4DWoEXHheJWFQZaxefk4u83OOc2X0P4NMP3RvACUQv-Az1hSYc</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Foglia, P. ; Panicucci, F. ; Prete, C.A. ; Solinas, M.</creator><creatorcontrib>Foglia, P. ; Panicucci, F. ; Prete, C.A. ; Solinas, M.</creatorcontrib><description>Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses' distribution to the L2 cache that is not uniformly distributed to all the cache banks.</description><identifier>ISBN: 9780769537825</identifier><identifier>ISBN: 0769537820</identifier><identifier>DOI: 10.1109/DSD.2009.153</identifier><identifier>LCCN: 2009906940</identifier><language>eng</language><publisher>IEEE</publisher><subject>Bandwidth ; cache ; Clocks ; Delay effects ; Design engineering ; Design methodology ; Digital systems ; latency ; mapping ; Network-on-a-chip ; NUCA ; Protocols ; Topology ; Wire ; wire-delay</subject><ispartof>2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, 2009, p.26-33</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5349985$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,27923,54918</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5349985$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Foglia, P.</creatorcontrib><creatorcontrib>Panicucci, F.</creatorcontrib><creatorcontrib>Prete, C.A.</creatorcontrib><creatorcontrib>Solinas, M.</creatorcontrib><title>An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload</title><title>2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools</title><addtitle>DSD</addtitle><description>Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses' distribution to the L2 cache that is not uniformly distributed to all the cache banks.</description><subject>Bandwidth</subject><subject>cache</subject><subject>Clocks</subject><subject>Delay effects</subject><subject>Design engineering</subject><subject>Design methodology</subject><subject>Digital systems</subject><subject>latency</subject><subject>mapping</subject><subject>Network-on-a-chip</subject><subject>NUCA</subject><subject>Protocols</subject><subject>Topology</subject><subject>Wire</subject><subject>wire-delay</subject><isbn>9780769537825</isbn><isbn>0769537820</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjF1LwzAYRgMyUGfvvPMmf6D1bdJ8vJe1q1OYH1iHlyNLE43WVNpu4L_Xoc_N4XDgIeQ8hyzPAS8XzSJjAJjlgh-RBJUGJVFwpZmYkdNDQpBYwDFJxvEdfieYRKlOSF1GWu9NtzNT6CPtPb1yb2Yf-mE8SJPer6uSVnePI33axRjiK21scHEKPlj60g8fXW_aMzLzphtd8s85WV_Xz9VNunpY3lblKg25ElMqJGrltl4KybeGO1ZwUzDPLHiNrm2VaUFLydEgZ4DWoEXHheJWFQZaxefk4u83OOc2X0P4NMP3RvACUQv-Az1hSYc</recordid><startdate>200908</startdate><enddate>200908</enddate><creator>Foglia, P.</creator><creator>Panicucci, F.</creator><creator>Prete, C.A.</creator><creator>Solinas, M.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200908</creationdate><title>An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload</title><author>Foglia, P. ; Panicucci, F. ; Prete, C.A. ; Solinas, M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-56987ebf6563ba3e243a42f2c0f89edd7ad086639a93209ca9c9e3573c74a0d73</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Bandwidth</topic><topic>cache</topic><topic>Clocks</topic><topic>Delay effects</topic><topic>Design engineering</topic><topic>Design methodology</topic><topic>Digital systems</topic><topic>latency</topic><topic>mapping</topic><topic>Network-on-a-chip</topic><topic>NUCA</topic><topic>Protocols</topic><topic>Topology</topic><topic>Wire</topic><topic>wire-delay</topic><toplevel>online_resources</toplevel><creatorcontrib>Foglia, P.</creatorcontrib><creatorcontrib>Panicucci, F.</creatorcontrib><creatorcontrib>Prete, C.A.</creatorcontrib><creatorcontrib>Solinas, M.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Foglia, P.</au><au>Panicucci, F.</au><au>Prete, C.A.</au><au>Solinas, M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload</atitle><btitle>2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools</btitle><stitle>DSD</stitle><date>2009-08</date><risdate>2009</risdate><spage>26</spage><epage>33</epage><pages>26-33</pages><isbn>9780769537825</isbn><isbn>0769537820</isbn><abstract>Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses' distribution to the L2 cache that is not uniformly distributed to all the cache banks.</abstract><pub>IEEE</pub><doi>10.1109/DSD.2009.153</doi><tpages>8</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 9780769537825 |
ispartof | 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, 2009, p.26-33 |
issn | |
language | eng |
recordid | cdi_ieee_primary_5349985 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Bandwidth cache Clocks Delay effects Design engineering Design methodology Digital systems latency mapping Network-on-a-chip NUCA Protocols Topology Wire wire-delay |
title | An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T12%3A54%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=An%20Evaluation%20of%20Behaviors%20of%20S-NUCA%20CMPs%20Running%20Scientific%20Workload&rft.btitle=2009%2012th%20Euromicro%20Conference%20on%20Digital%20System%20Design,%20Architectures,%20Methods%20and%20Tools&rft.au=Foglia,%20P.&rft.date=2009-08&rft.spage=26&rft.epage=33&rft.pages=26-33&rft.isbn=9780769537825&rft.isbn_list=0769537820&rft_id=info:doi/10.1109/DSD.2009.153&rft_dat=%3Cieee_6IE%3E5349985%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5349985&rfr_iscdi=true |