A Cost-Efficient L1-L2 Multicore Interconnect: Performance, Power, and Area Considerations

Processor manufacturers use advances in manufacturing technologies to increase the number of cores on chip in order to scale performance in a cost-efficient manner. As the number of cores scales up, not all cores can be directly connected to the main memory and there is a need for hierarchy, for exa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2011-03, Vol.58 (3), p.529-538
Hauptverfasser:	Golander, Amit, Levison, Nadav, Heymann, Omer, Briskman, Alexander, Wolski, Mark J, Robinson, Eric F
Format:	Artikel
Sprache:	eng
Schlagworte:	Arbitration Chip multiprocessor (CMP) Circuits Clocks Clusters computer architecture Consumption Hierarchies Integrated circuit interconnections interconnect architectures Latches leakage Metals Microprocessors multiprocessors power Power consumption ring Switches Switching Timing Wire
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	538
container_issue	3
container_start_page	529
container_title	IEEE transactions on circuits and systems. I, Regular papers
container_volume	58
creator	Golander, Amit Levison, Nadav Heymann, Omer Briskman, Alexander Wolski, Mark J Robinson, Eric F
description	Processor manufacturers use advances in manufacturing technologies to increase the number of cores on chip in order to scale performance in a cost-efficient manner. As the number of cores scales up, not all cores can be directly connected to the main memory and there is a need for hierarchy, for example, by arranging them in clusters that share L2 caches. This paper focuses on designing cost-efficient L1-L2 interconnects. We discuss performance and power- and area-consumption considerations for a real processor designed in 45-nm technology. We explain the architectures and heuristics developed, including a smart floorplan with instance flips to address interconnect latency, customized decentralized arbitration schemes tailored per transaction type, and heterogeneous Vt device assignment to reduce overall power consumption, taking into account the expected switching factors. These and other methods worked together to achieve high throughput in a power-efficient interconnect that consumes less than 3% of the compute cluster area.
doi_str_mv	10.1109/TCSI.2010.2073832
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_5644727</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5644727</ieee_id><sourcerecordid>1671228585</sourcerecordid><originalsourceid>FETCH-LOGICAL-c255t-a4f00370467f6bccaa3a9e4e714701bbcb66c7b8d502f60cb6facb0bc45d5e2b3</originalsourceid><addsrcrecordid>eNpdkE9LAzEQxRdRsFY_gHhZPHno6iSbP7veylK1ULFgvXgJ2XQCW9qkJlvEb29KiwdPM2_m94bhZdk1gXtCoH5YNO_TewpJUpBlVdKTbEA4rwqoQJzue1YXaVydZxcxrgBoDSUZZJ_jvPGxLybWdqZD1-czUsxo_rpb953xAfOp6zEY7xya_jGfY7A-bLQzOMrn_hvDKNdumY8D6nTJxW6JQfdd6i6zM6vXEa-OdZh9PE0WzUsxe3ueNuNZYSjnfaGZBSglMCGtaI3RutQ1MpSESSBta1ohjGyrJQdqBSRptWmhNYwvOdK2HGZ3h7vb4L92GHu16aLB9Vo79LuoiJCE0opXPKG3_9CV3wWXvlNpWwlW12WCyAEywccY0Kpt6DY6_CgCah-22oet9mGrY9jJc3PwdIj4x3PBmKSy_AVPdXon</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>853864993</pqid></control><display><type>article</type><title>A Cost-Efficient L1-L2 Multicore Interconnect: Performance, Power, and Area Considerations</title><source>IEEE Electronic Library (IEL)</source><creator>Golander, Amit ; Levison, Nadav ; Heymann, Omer ; Briskman, Alexander ; Wolski, Mark J ; Robinson, Eric F</creator><creatorcontrib>Golander, Amit ; Levison, Nadav ; Heymann, Omer ; Briskman, Alexander ; Wolski, Mark J ; Robinson, Eric F</creatorcontrib><description>Processor manufacturers use advances in manufacturing technologies to increase the number of cores on chip in order to scale performance in a cost-efficient manner. As the number of cores scales up, not all cores can be directly connected to the main memory and there is a need for hierarchy, for example, by arranging them in clusters that share L2 caches. This paper focuses on designing cost-efficient L1-L2 interconnects. We discuss performance and power- and area-consumption considerations for a real processor designed in 45-nm technology. We explain the architectures and heuristics developed, including a smart floorplan with instance flips to address interconnect latency, customized decentralized arbitration schemes tailored per transaction type, and heterogeneous Vt device assignment to reduce overall power consumption, taking into account the expected switching factors. These and other methods worked together to achieve high throughput in a power-efficient interconnect that consumes less than 3% of the compute cluster area.</description><identifier>ISSN: 1549-8328</identifier><identifier>EISSN: 1558-0806</identifier><identifier>DOI: 10.1109/TCSI.2010.2073832</identifier><identifier>CODEN: ITCSCH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Arbitration ; Chip multiprocessor (CMP) ; Circuits ; Clocks ; Clusters ; computer architecture ; Consumption ; Hierarchies ; Integrated circuit interconnections ; interconnect architectures ; Latches ; leakage ; Metals ; Microprocessors ; multiprocessors ; power ; Power consumption ; ring ; Switches ; Switching ; Timing ; Wire</subject><ispartof>IEEE transactions on circuits and systems. I, Regular papers, 2011-03, Vol.58 (3), p.529-538</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Mar 2011</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c255t-a4f00370467f6bccaa3a9e4e714701bbcb66c7b8d502f60cb6facb0bc45d5e2b3</citedby><cites>FETCH-LOGICAL-c255t-a4f00370467f6bccaa3a9e4e714701bbcb66c7b8d502f60cb6facb0bc45d5e2b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5644727$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5644727$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Golander, Amit</creatorcontrib><creatorcontrib>Levison, Nadav</creatorcontrib><creatorcontrib>Heymann, Omer</creatorcontrib><creatorcontrib>Briskman, Alexander</creatorcontrib><creatorcontrib>Wolski, Mark J</creatorcontrib><creatorcontrib>Robinson, Eric F</creatorcontrib><title>A Cost-Efficient L1-L2 Multicore Interconnect: Performance, Power, and Area Considerations</title><title>IEEE transactions on circuits and systems. I, Regular papers</title><addtitle>TCSI</addtitle><description>Processor manufacturers use advances in manufacturing technologies to increase the number of cores on chip in order to scale performance in a cost-efficient manner. As the number of cores scales up, not all cores can be directly connected to the main memory and there is a need for hierarchy, for example, by arranging them in clusters that share L2 caches. This paper focuses on designing cost-efficient L1-L2 interconnects. We discuss performance and power- and area-consumption considerations for a real processor designed in 45-nm technology. We explain the architectures and heuristics developed, including a smart floorplan with instance flips to address interconnect latency, customized decentralized arbitration schemes tailored per transaction type, and heterogeneous Vt device assignment to reduce overall power consumption, taking into account the expected switching factors. These and other methods worked together to achieve high throughput in a power-efficient interconnect that consumes less than 3% of the compute cluster area.</description><subject>Arbitration</subject><subject>Chip multiprocessor (CMP)</subject><subject>Circuits</subject><subject>Clocks</subject><subject>Clusters</subject><subject>computer architecture</subject><subject>Consumption</subject><subject>Hierarchies</subject><subject>Integrated circuit interconnections</subject><subject>interconnect architectures</subject><subject>Latches</subject><subject>leakage</subject><subject>Metals</subject><subject>Microprocessors</subject><subject>multiprocessors</subject><subject>power</subject><subject>Power consumption</subject><subject>ring</subject><subject>Switches</subject><subject>Switching</subject><subject>Timing</subject><subject>Wire</subject><issn>1549-8328</issn><issn>1558-0806</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE9LAzEQxRdRsFY_gHhZPHno6iSbP7veylK1ULFgvXgJ2XQCW9qkJlvEb29KiwdPM2_m94bhZdk1gXtCoH5YNO_TewpJUpBlVdKTbEA4rwqoQJzue1YXaVydZxcxrgBoDSUZZJ_jvPGxLybWdqZD1-czUsxo_rpb953xAfOp6zEY7xya_jGfY7A-bLQzOMrn_hvDKNdumY8D6nTJxW6JQfdd6i6zM6vXEa-OdZh9PE0WzUsxe3ueNuNZYSjnfaGZBSglMCGtaI3RutQ1MpSESSBta1ohjGyrJQdqBSRptWmhNYwvOdK2HGZ3h7vb4L92GHu16aLB9Vo79LuoiJCE0opXPKG3_9CV3wWXvlNpWwlW12WCyAEywccY0Kpt6DY6_CgCah-22oet9mGrY9jJc3PwdIj4x3PBmKSy_AVPdXon</recordid><startdate>201103</startdate><enddate>201103</enddate><creator>Golander, Amit</creator><creator>Levison, Nadav</creator><creator>Heymann, Omer</creator><creator>Briskman, Alexander</creator><creator>Wolski, Mark J</creator><creator>Robinson, Eric F</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>201103</creationdate><title>A Cost-Efficient L1-L2 Multicore Interconnect: Performance, Power, and Area Considerations</title><author>Golander, Amit ; Levison, Nadav ; Heymann, Omer ; Briskman, Alexander ; Wolski, Mark J ; Robinson, Eric F</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c255t-a4f00370467f6bccaa3a9e4e714701bbcb66c7b8d502f60cb6facb0bc45d5e2b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Arbitration</topic><topic>Chip multiprocessor (CMP)</topic><topic>Circuits</topic><topic>Clocks</topic><topic>Clusters</topic><topic>computer architecture</topic><topic>Consumption</topic><topic>Hierarchies</topic><topic>Integrated circuit interconnections</topic><topic>interconnect architectures</topic><topic>Latches</topic><topic>leakage</topic><topic>Metals</topic><topic>Microprocessors</topic><topic>multiprocessors</topic><topic>power</topic><topic>Power consumption</topic><topic>ring</topic><topic>Switches</topic><topic>Switching</topic><topic>Timing</topic><topic>Wire</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Golander, Amit</creatorcontrib><creatorcontrib>Levison, Nadav</creatorcontrib><creatorcontrib>Heymann, Omer</creatorcontrib><creatorcontrib>Briskman, Alexander</creatorcontrib><creatorcontrib>Wolski, Mark J</creatorcontrib><creatorcontrib>Robinson, Eric F</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on circuits and systems. I, Regular papers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Golander, Amit</au><au>Levison, Nadav</au><au>Heymann, Omer</au><au>Briskman, Alexander</au><au>Wolski, Mark J</au><au>Robinson, Eric F</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Cost-Efficient L1-L2 Multicore Interconnect: Performance, Power, and Area Considerations</atitle><jtitle>IEEE transactions on circuits and systems. I, Regular papers</jtitle><stitle>TCSI</stitle><date>2011-03</date><risdate>2011</risdate><volume>58</volume><issue>3</issue><spage>529</spage><epage>538</epage><pages>529-538</pages><issn>1549-8328</issn><eissn>1558-0806</eissn><coden>ITCSCH</coden><abstract>Processor manufacturers use advances in manufacturing technologies to increase the number of cores on chip in order to scale performance in a cost-efficient manner. As the number of cores scales up, not all cores can be directly connected to the main memory and there is a need for hierarchy, for example, by arranging them in clusters that share L2 caches. This paper focuses on designing cost-efficient L1-L2 interconnects. We discuss performance and power- and area-consumption considerations for a real processor designed in 45-nm technology. We explain the architectures and heuristics developed, including a smart floorplan with instance flips to address interconnect latency, customized decentralized arbitration schemes tailored per transaction type, and heterogeneous Vt device assignment to reduce overall power consumption, taking into account the expected switching factors. These and other methods worked together to achieve high throughput in a power-efficient interconnect that consumes less than 3% of the compute cluster area.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSI.2010.2073832</doi><tpages>10</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1549-8328
ispartof	IEEE transactions on circuits and systems. I, Regular papers, 2011-03, Vol.58 (3), p.529-538
issn	1549-8328 1558-0806
language	eng
recordid	cdi_ieee_primary_5644727
source	IEEE Electronic Library (IEL)
subjects	Arbitration Chip multiprocessor (CMP) Circuits Clocks Clusters computer architecture Consumption Hierarchies Integrated circuit interconnections interconnect architectures Latches leakage Metals Microprocessors multiprocessors power Power consumption ring Switches Switching Timing Wire
title	A Cost-Efficient L1-L2 Multicore Interconnect: Performance, Power, and Area Considerations
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T06%3A26%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Cost-Efficient%20L1-L2%20Multicore%20Interconnect:%20Performance,%20Power,%20and%20Area%20Considerations&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems.%20I,%20Regular%20papers&rft.au=Golander,%20Amit&rft.date=2011-03&rft.volume=58&rft.issue=3&rft.spage=529&rft.epage=538&rft.pages=529-538&rft.issn=1549-8328&rft.eissn=1558-0806&rft.coden=ITCSCH&rft_id=info:doi/10.1109/TCSI.2010.2073832&rft_dat=%3Cproquest_RIE%3E1671228585%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=853864993&rft_id=info:pmid/&rft_ieee_id=5644727&rfr_iscdi=true