An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload

Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Foglia, P., Panicucci, F., Prete, C.A., Solinas, M.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 33
container_issue
container_start_page 26
container_title
container_volume
creator Foglia, P.
Panicucci, F.
Prete, C.A.
Solinas, M.
description Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses' distribution to the L2 cache that is not uniformly distributed to all the cache banks.
doi_str_mv 10.1109/DSD.2009.153
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5349985</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5349985</ieee_id><sourcerecordid>5349985</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-56987ebf6563ba3e243a42f2c0f89edd7ad086639a93209ca9c9e3573c74a0d73</originalsourceid><addsrcrecordid>eNotjF1LwzAYRgMyUGfvvPMmf6D1bdJ8vJe1q1OYH1iHlyNLE43WVNpu4L_Xoc_N4XDgIeQ8hyzPAS8XzSJjAJjlgh-RBJUGJVFwpZmYkdNDQpBYwDFJxvEdfieYRKlOSF1GWu9NtzNT6CPtPb1yb2Yf-mE8SJPer6uSVnePI33axRjiK21scHEKPlj60g8fXW_aMzLzphtd8s85WV_Xz9VNunpY3lblKg25ElMqJGrltl4KybeGO1ZwUzDPLHiNrm2VaUFLydEgZ4DWoEXHheJWFQZaxefk4u83OOc2X0P4NMP3RvACUQv-Az1hSYc</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Foglia, P. ; Panicucci, F. ; Prete, C.A. ; Solinas, M.</creator><creatorcontrib>Foglia, P. ; Panicucci, F. ; Prete, C.A. ; Solinas, M.</creatorcontrib><description>Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses' distribution to the L2 cache that is not uniformly distributed to all the cache banks.</description><identifier>ISBN: 9780769537825</identifier><identifier>ISBN: 0769537820</identifier><identifier>DOI: 10.1109/DSD.2009.153</identifier><identifier>LCCN: 2009906940</identifier><language>eng</language><publisher>IEEE</publisher><subject>Bandwidth ; cache ; Clocks ; Delay effects ; Design engineering ; Design methodology ; Digital systems ; latency ; mapping ; Network-on-a-chip ; NUCA ; Protocols ; Topology ; Wire ; wire-delay</subject><ispartof>2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, 2009, p.26-33</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5349985$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,27923,54918</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5349985$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Foglia, P.</creatorcontrib><creatorcontrib>Panicucci, F.</creatorcontrib><creatorcontrib>Prete, C.A.</creatorcontrib><creatorcontrib>Solinas, M.</creatorcontrib><title>An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload</title><title>2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools</title><addtitle>DSD</addtitle><description>Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses' distribution to the L2 cache that is not uniformly distributed to all the cache banks.</description><subject>Bandwidth</subject><subject>cache</subject><subject>Clocks</subject><subject>Delay effects</subject><subject>Design engineering</subject><subject>Design methodology</subject><subject>Digital systems</subject><subject>latency</subject><subject>mapping</subject><subject>Network-on-a-chip</subject><subject>NUCA</subject><subject>Protocols</subject><subject>Topology</subject><subject>Wire</subject><subject>wire-delay</subject><isbn>9780769537825</isbn><isbn>0769537820</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjF1LwzAYRgMyUGfvvPMmf6D1bdJ8vJe1q1OYH1iHlyNLE43WVNpu4L_Xoc_N4XDgIeQ8hyzPAS8XzSJjAJjlgh-RBJUGJVFwpZmYkdNDQpBYwDFJxvEdfieYRKlOSF1GWu9NtzNT6CPtPb1yb2Yf-mE8SJPer6uSVnePI33axRjiK21scHEKPlj60g8fXW_aMzLzphtd8s85WV_Xz9VNunpY3lblKg25ElMqJGrltl4KybeGO1ZwUzDPLHiNrm2VaUFLydEgZ4DWoEXHheJWFQZaxefk4u83OOc2X0P4NMP3RvACUQv-Az1hSYc</recordid><startdate>200908</startdate><enddate>200908</enddate><creator>Foglia, P.</creator><creator>Panicucci, F.</creator><creator>Prete, C.A.</creator><creator>Solinas, M.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200908</creationdate><title>An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload</title><author>Foglia, P. ; Panicucci, F. ; Prete, C.A. ; Solinas, M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-56987ebf6563ba3e243a42f2c0f89edd7ad086639a93209ca9c9e3573c74a0d73</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Bandwidth</topic><topic>cache</topic><topic>Clocks</topic><topic>Delay effects</topic><topic>Design engineering</topic><topic>Design methodology</topic><topic>Digital systems</topic><topic>latency</topic><topic>mapping</topic><topic>Network-on-a-chip</topic><topic>NUCA</topic><topic>Protocols</topic><topic>Topology</topic><topic>Wire</topic><topic>wire-delay</topic><toplevel>online_resources</toplevel><creatorcontrib>Foglia, P.</creatorcontrib><creatorcontrib>Panicucci, F.</creatorcontrib><creatorcontrib>Prete, C.A.</creatorcontrib><creatorcontrib>Solinas, M.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Foglia, P.</au><au>Panicucci, F.</au><au>Prete, C.A.</au><au>Solinas, M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload</atitle><btitle>2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools</btitle><stitle>DSD</stitle><date>2009-08</date><risdate>2009</risdate><spage>26</spage><epage>33</epage><pages>26-33</pages><isbn>9780769537825</isbn><isbn>0769537820</isbn><abstract>Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses' distribution to the L2 cache that is not uniformly distributed to all the cache banks.</abstract><pub>IEEE</pub><doi>10.1109/DSD.2009.153</doi><tpages>8</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 9780769537825
ispartof 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, 2009, p.26-33
issn
language eng
recordid cdi_ieee_primary_5349985
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Bandwidth
cache
Clocks
Delay effects
Design engineering
Design methodology
Digital systems
latency
mapping
Network-on-a-chip
NUCA
Protocols
Topology
Wire
wire-delay
title An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T12%3A54%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=An%20Evaluation%20of%20Behaviors%20of%20S-NUCA%20CMPs%20Running%20Scientific%20Workload&rft.btitle=2009%2012th%20Euromicro%20Conference%20on%20Digital%20System%20Design,%20Architectures,%20Methods%20and%20Tools&rft.au=Foglia,%20P.&rft.date=2009-08&rft.spage=26&rft.epage=33&rft.pages=26-33&rft.isbn=9780769537825&rft.isbn_list=0769537820&rft_id=info:doi/10.1109/DSD.2009.153&rft_dat=%3Cieee_6IE%3E5349985%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5349985&rfr_iscdi=true