An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload

Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Foglia, P., Panicucci, F., Prete, C.A., Solinas, M.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Bandwidth cache Clocks Delay effects Design engineering Design methodology Digital systems latency mapping Network-on-a-chip NUCA Protocols Topology Wire wire-delay
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	33
container_issue
container_start_page	26
container_title
container_volume
creator	Foglia, P. Panicucci, F. Prete, C.A. Solinas, M.
description	Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses' distribution to the L2 cache that is not uniformly distributed to all the cache banks.
doi_str_mv	10.1109/DSD.2009.153
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5349985</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5349985</ieee_id><sourcerecordid>5349985</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-56987ebf6563ba3e243a42f2c0f89edd7ad086639a93209ca9c9e3573c74a0d73</originalsourceid><addsrcrecordid>eNotjF1LwzAYRgMyUGfvvPMmf6D1bdJ8vJe1q1OYH1iHlyNLE43WVNpu4L_Xoc_N4XDgIeQ8hyzPAS8XzSJjAJjlgh-RBJUGJVFwpZmYkdNDQpBYwDFJxvEdfieYRKlOSF1GWu9NtzNT6CPtPb1yb2Yf-mE8SJPer6uSVnePI33axRjiK21scHEKPlj60g8fXW_aMzLzphtd8s85WV_Xz9VNunpY3lblKg25ElMqJGrltl4KybeGO1ZwUzDPLHiNrm2VaUFLydEgZ4DWoEXHheJWFQZaxefk4u83OOc2X0P4NMP3RvACUQv-Az1hSYc</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Foglia, P. ; Panicucci, F. ; Prete, C.A. ; Solinas, M.</creator><creatorcontrib>Foglia, P. ; Panicucci, F. ; Prete, C.A. ; Solinas, M.</creatorcontrib><description>Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses' distribution to the L2 cache that is not uniformly distributed to all the cache banks.</description><identifier>ISBN: 9780769537825</identifier><identifier>ISBN: 0769537820</identifier><identifier>DOI: 10.1109/DSD.2009.153</identifier><identifier>LCCN: 2009906940</identifier><language>eng</language><publisher>IEEE</publisher><subject>Bandwidth ; cache ; Clocks ; Delay effects ; Design engineering ; Design methodology ; Digital systems ; latency ; mapping ; Network-on-a-chip ; NUCA ; Protocols ; Topology ; Wire ; wire-delay</subject><ispartof>2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, 2009, p.26-33</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5349985$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,27923,54918</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5349985$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Foglia, P.</creatorcontrib><creatorcontrib>Panicucci, F.</creatorcontrib><creatorcontrib>Prete, C.A.</creatorcontrib><creatorcontrib>Solinas, M.</creatorcontrib><title>An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload</title><title>2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools</title><addtitle>DSD</addtitle><description>Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses' distribution to the L2 cache that is not uniformly distributed to all the cache banks.</description><subject>Bandwidth</subject><subject>cache</subject><subject>Clocks</subject><subject>Delay effects</subject><subject>Design engineering</subject><subject>Design methodology</subject><subject>Digital systems</subject><subject>latency</subject><subject>mapping</subject><subject>Network-on-a-chip</subject><subject>NUCA</subject><subject>Protocols</subject><subject>Topology</subject><subject>Wire</subject><subject>wire-delay</subject><isbn>9780769537825</isbn><isbn>0769537820</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjF1LwzAYRgMyUGfvvPMmf6D1bdJ8vJe1q1OYH1iHlyNLE43WVNpu4L_Xoc_N4XDgIeQ8hyzPAS8XzSJjAJjlgh-RBJUGJVFwpZmYkdNDQpBYwDFJxvEdfieYRKlOSF1GWu9NtzNT6CPtPb1yb2Yf-mE8SJPer6uSVnePI33axRjiK21scHEKPlj60g8fXW_aMzLzphtd8s85WV_Xz9VNunpY3lblKg25ElMqJGrltl4KybeGO1ZwUzDPLHiNrm2VaUFLydEgZ4DWoEXHheJWFQZaxefk4u83OOc2X0P4NMP3RvACUQv-Az1hSYc</recordid><startdate>200908</startdate><enddate>200908</enddate><creator>Foglia, P.</creator><creator>Panicucci, F.</creator><creator>Prete, C.A.</creator><creator>Solinas, M.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200908</creationdate><title>An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload</title><author>Foglia, P. ; Panicucci, F. ; Prete, C.A. ; Solinas, M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-56987ebf6563ba3e243a42f2c0f89edd7ad086639a93209ca9c9e3573c74a0d73</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Bandwidth</topic><topic>cache</topic><topic>Clocks</topic><topic>Delay effects</topic><topic>Design engineering</topic><topic>Design methodology</topic><topic>Digital systems</topic><topic>latency</topic><topic>mapping</topic><topic>Network-on-a-chip</topic><topic>NUCA</topic><topic>Protocols</topic><topic>Topology</topic><topic>Wire</topic><topic>wire-delay</topic><toplevel>online_resources</toplevel><creatorcontrib>Foglia, P.</creatorcontrib><creatorcontrib>Panicucci, F.</creatorcontrib><creatorcontrib>Prete, C.A.</creatorcontrib><creatorcontrib>Solinas, M.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Foglia, P.</au><au>Panicucci, F.</au><au>Prete, C.A.</au><au>Solinas, M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload</atitle><btitle>2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools</btitle><stitle>DSD</stitle><date>2009-08</date><risdate>2009</risdate><spage>26</spage><epage>33</epage><pages>26-33</pages><isbn>9780769537825</isbn><isbn>0769537820</isbn><abstract>Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses' distribution to the L2 cache that is not uniformly distributed to all the cache banks.</abstract><pub>IEEE</pub><doi>10.1109/DSD.2009.153</doi><tpages>8</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 9780769537825
ispartof	2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, 2009, p.26-33
issn
language	eng
recordid	cdi_ieee_primary_5349985
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Bandwidth cache Clocks Delay effects Design engineering Design methodology Digital systems latency mapping Network-on-a-chip NUCA Protocols Topology Wire wire-delay
title	An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T12%3A54%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=An%20Evaluation%20of%20Behaviors%20of%20S-NUCA%20CMPs%20Running%20Scientific%20Workload&rft.btitle=2009%2012th%20Euromicro%20Conference%20on%20Digital%20System%20Design,%20Architectures,%20Methods%20and%20Tools&rft.au=Foglia,%20P.&rft.date=2009-08&rft.spage=26&rft.epage=33&rft.pages=26-33&rft.isbn=9780769537825&rft.isbn_list=0769537820&rft_id=info:doi/10.1109/DSD.2009.153&rft_dat=%3Cieee_6IE%3E5349985%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5349985&rfr_iscdi=true