Efficient Hardware Barrier Synchronization in Many-Core CMPs

Traditional software-based barrier implementations for shared memory parallel machines tend to produce hotspots in terms of memory and network contention as the number of processors increases. This could limit their applicability to future many-core CMPs in which possibly several dozens of cores wou...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2012-08, Vol.23 (8), p.1453-1466
Hauptverfasser:	Abellan, J. L., Fernandez, J., Acacio, M. E.
Format:	Artikel
Sprache:	eng
Schlagworte:	barrier synchronization cache coherence Control systems energy efficiency global lines Hardware Many-core CMPs Proposals Protocols Radiation detectors Registers S-CSMA scalability Synchronization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1466
container_issue	8
container_start_page	1453
container_title	IEEE transactions on parallel and distributed systems
container_volume	23
creator	Abellan, J. L. Fernandez, J. Acacio, M. E.
description	Traditional software-based barrier implementations for shared memory parallel machines tend to produce hotspots in terms of memory and network contention as the number of processors increases. This could limit their applicability to future many-core CMPs in which possibly several dozens of cores would need to be synchronized efficiently. In this work, we develop GBarrier, a hardware-based barrier mechanism especially aimed at providing efficient barriers in future many-core CMPs. Our proposal deploys a dedicated G-line-based network to allow for fast and efficient signaling of barrier arrival and departure. Since GBarrier does not have any influence on the memory system, we avoid all coherence activity and barrier-related network traffic that traditional approaches introduce and that restrict scalability. Through detailed simulations of a 32-core CMP, we compare GBarrier against one of the most efficient software-based barrier implementations for a set of kernels and scientific applications. Evaluation results show average reductions of 54 and 21 percent in execution time, 53 and 18 percent in network traffic, and also 76 and 31 percent in the energy-delay 2 product metric for the full CMP when the kernels and scientific applications, respectively, are considered.
doi_str_mv	10.1109/TPDS.2011.304
format	Article
fullrecord	<record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPDS_2011_304</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6104036</ieee_id><sourcerecordid>10_1109_TPDS_2011_304</sourcerecordid><originalsourceid>FETCH-LOGICAL-c323t-cb1055644083ebdd514482fe4a13f7eabaa2389b1a6e0bfc1e6a22fde3c3e4b23</originalsourceid><addsrcrecordid>eNo9kE1Lw0AURQdRsFaXrtzkD0x8bz5iAm40Viu0WGhdh5fJGxzRRCYBqb_ehIqrexeHC-cKcYmQIkJxvds8bFMFiKkGcyRmaG0uFeb6eOxgrCwUFqfirO_fAdBYMDNxu_A-uMDtkCwpNt8UObmnGAPHZLtv3Vvs2vBDQ-jaJLTJmtq9LLsRKteb_lycePro-eIv5-L1cbErl3L18vRc3q2k00oP0tUI1mbGQK65bhqLxuTKsyHU_oapJlI6L2qkjKH2DjkjpXzD2mk2tdJzIQ-7LnZ9H9lXXzF8UtxXCNWkXk3q1aRejeojf3XgAzP_s9n4AehM_wJQ0lTg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Efficient Hardware Barrier Synchronization in Many-Core CMPs</title><source>IEEE Electronic Library (IEL)</source><creator>Abellan, J. L. ; Fernandez, J. ; Acacio, M. E.</creator><creatorcontrib>Abellan, J. L. ; Fernandez, J. ; Acacio, M. E.</creatorcontrib><description>Traditional software-based barrier implementations for shared memory parallel machines tend to produce hotspots in terms of memory and network contention as the number of processors increases. This could limit their applicability to future many-core CMPs in which possibly several dozens of cores would need to be synchronized efficiently. In this work, we develop GBarrier, a hardware-based barrier mechanism especially aimed at providing efficient barriers in future many-core CMPs. Our proposal deploys a dedicated G-line-based network to allow for fast and efficient signaling of barrier arrival and departure. Since GBarrier does not have any influence on the memory system, we avoid all coherence activity and barrier-related network traffic that traditional approaches introduce and that restrict scalability. Through detailed simulations of a 32-core CMP, we compare GBarrier against one of the most efficient software-based barrier implementations for a set of kernels and scientific applications. Evaluation results show average reductions of 54 and 21 percent in execution time, 53 and 18 percent in network traffic, and also 76 and 31 percent in the energy-delay 2 product metric for the full CMP when the kernels and scientific applications, respectively, are considered.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2011.304</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>IEEE</publisher><subject>barrier synchronization ; cache coherence ; Control systems ; energy efficiency ; global lines ; Hardware ; Many-core CMPs ; Proposals ; Protocols ; Radiation detectors ; Registers ; S-CSMA ; scalability ; Synchronization</subject><ispartof>IEEE transactions on parallel and distributed systems, 2012-08, Vol.23 (8), p.1453-1466</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c323t-cb1055644083ebdd514482fe4a13f7eabaa2389b1a6e0bfc1e6a22fde3c3e4b23</citedby><cites>FETCH-LOGICAL-c323t-cb1055644083ebdd514482fe4a13f7eabaa2389b1a6e0bfc1e6a22fde3c3e4b23</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6104036$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27922,27923,54756</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6104036$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Abellan, J. L.</creatorcontrib><creatorcontrib>Fernandez, J.</creatorcontrib><creatorcontrib>Acacio, M. E.</creatorcontrib><title>Efficient Hardware Barrier Synchronization in Many-Core CMPs</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Traditional software-based barrier implementations for shared memory parallel machines tend to produce hotspots in terms of memory and network contention as the number of processors increases. This could limit their applicability to future many-core CMPs in which possibly several dozens of cores would need to be synchronized efficiently. In this work, we develop GBarrier, a hardware-based barrier mechanism especially aimed at providing efficient barriers in future many-core CMPs. Our proposal deploys a dedicated G-line-based network to allow for fast and efficient signaling of barrier arrival and departure. Since GBarrier does not have any influence on the memory system, we avoid all coherence activity and barrier-related network traffic that traditional approaches introduce and that restrict scalability. Through detailed simulations of a 32-core CMP, we compare GBarrier against one of the most efficient software-based barrier implementations for a set of kernels and scientific applications. Evaluation results show average reductions of 54 and 21 percent in execution time, 53 and 18 percent in network traffic, and also 76 and 31 percent in the energy-delay 2 product metric for the full CMP when the kernels and scientific applications, respectively, are considered.</description><subject>barrier synchronization</subject><subject>cache coherence</subject><subject>Control systems</subject><subject>energy efficiency</subject><subject>global lines</subject><subject>Hardware</subject><subject>Many-core CMPs</subject><subject>Proposals</subject><subject>Protocols</subject><subject>Radiation detectors</subject><subject>Registers</subject><subject>S-CSMA</subject><subject>scalability</subject><subject>Synchronization</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AURQdRsFaXrtzkD0x8bz5iAm40Viu0WGhdh5fJGxzRRCYBqb_ehIqrexeHC-cKcYmQIkJxvds8bFMFiKkGcyRmaG0uFeb6eOxgrCwUFqfirO_fAdBYMDNxu_A-uMDtkCwpNt8UObmnGAPHZLtv3Vvs2vBDQ-jaJLTJmtq9LLsRKteb_lycePro-eIv5-L1cbErl3L18vRc3q2k00oP0tUI1mbGQK65bhqLxuTKsyHU_oapJlI6L2qkjKH2DjkjpXzD2mk2tdJzIQ-7LnZ9H9lXXzF8UtxXCNWkXk3q1aRejeojf3XgAzP_s9n4AehM_wJQ0lTg</recordid><startdate>20120801</startdate><enddate>20120801</enddate><creator>Abellan, J. L.</creator><creator>Fernandez, J.</creator><creator>Acacio, M. E.</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20120801</creationdate><title>Efficient Hardware Barrier Synchronization in Many-Core CMPs</title><author>Abellan, J. L. ; Fernandez, J. ; Acacio, M. E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c323t-cb1055644083ebdd514482fe4a13f7eabaa2389b1a6e0bfc1e6a22fde3c3e4b23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>barrier synchronization</topic><topic>cache coherence</topic><topic>Control systems</topic><topic>energy efficiency</topic><topic>global lines</topic><topic>Hardware</topic><topic>Many-core CMPs</topic><topic>Proposals</topic><topic>Protocols</topic><topic>Radiation detectors</topic><topic>Registers</topic><topic>S-CSMA</topic><topic>scalability</topic><topic>Synchronization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Abellan, J. L.</creatorcontrib><creatorcontrib>Fernandez, J.</creatorcontrib><creatorcontrib>Acacio, M. E.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Abellan, J. L.</au><au>Fernandez, J.</au><au>Acacio, M. E.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient Hardware Barrier Synchronization in Many-Core CMPs</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2012-08-01</date><risdate>2012</risdate><volume>23</volume><issue>8</issue><spage>1453</spage><epage>1466</epage><pages>1453-1466</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Traditional software-based barrier implementations for shared memory parallel machines tend to produce hotspots in terms of memory and network contention as the number of processors increases. This could limit their applicability to future many-core CMPs in which possibly several dozens of cores would need to be synchronized efficiently. In this work, we develop GBarrier, a hardware-based barrier mechanism especially aimed at providing efficient barriers in future many-core CMPs. Our proposal deploys a dedicated G-line-based network to allow for fast and efficient signaling of barrier arrival and departure. Since GBarrier does not have any influence on the memory system, we avoid all coherence activity and barrier-related network traffic that traditional approaches introduce and that restrict scalability. Through detailed simulations of a 32-core CMP, we compare GBarrier against one of the most efficient software-based barrier implementations for a set of kernels and scientific applications. Evaluation results show average reductions of 54 and 21 percent in execution time, 53 and 18 percent in network traffic, and also 76 and 31 percent in the energy-delay 2 product metric for the full CMP when the kernels and scientific applications, respectively, are considered.</abstract><pub>IEEE</pub><doi>10.1109/TPDS.2011.304</doi><tpages>14</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1045-9219
ispartof	IEEE transactions on parallel and distributed systems, 2012-08, Vol.23 (8), p.1453-1466
issn	1045-9219 1558-2183
language	eng
recordid	cdi_crossref_primary_10_1109_TPDS_2011_304
source	IEEE Electronic Library (IEL)
subjects	barrier synchronization cache coherence Control systems energy efficiency global lines Hardware Many-core CMPs Proposals Protocols Radiation detectors Registers S-CSMA scalability Synchronization
title	Efficient Hardware Barrier Synchronization in Many-Core CMPs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T20%3A26%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20Hardware%20Barrier%20Synchronization%20in%20Many-Core%20CMPs&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Abellan,%20J.%20L.&rft.date=2012-08-01&rft.volume=23&rft.issue=8&rft.spage=1453&rft.epage=1466&rft.pages=1453-1466&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2011.304&rft_dat=%3Ccrossref_RIE%3E10_1109_TPDS_2011_304%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6104036&rfr_iscdi=true