Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor
Barrier synchronization is an important and performance critical primitive in many parallel programming models, including the popular OpenMP model. In this paper, we compare the performance of several software implementations of barrier synchronization and introduce a new implementation, distributed...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Buchkapitel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 98 |
---|---|
container_issue | |
container_start_page | 84 |
container_title | |
container_volume | 2716 |
creator | Zhang, Guansong Martínez, Francisco Tal, Arie Blainey, Bob |
description | Barrier synchronization is an important and performance critical primitive in many parallel programming models, including the popular OpenMP model. In this paper, we compare the performance of several software implementations of barrier synchronization and introduce a new implementation, distributed counters with local sensor, which considerably reduces overhead on POWER3 and POWER4 SMP systems. Through experiments with the EPCC OpenMP benchmark, we demonstrate a 79% reduction in overhead on a 32-way POWER4 system and an 87% reduction in overhead on a 16-way POWER3 system when comparing with a fetch-and-add implementation. Since these improvements are primarily attributed to reduced L2 and L3 cache misses, we expect the relative performance of our implementation to increase with the number of processors in an SMP and as memory latencies lengthen relative to cache latencies. |
doi_str_mv | 10.1007/3-540-45009-2_7 |
format | Book Chapter |
fullrecord | <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_15691719</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC3036565_10_92</sourcerecordid><originalsourceid>FETCH-LOGICAL-p309t-a5c9b62b8aa531235859e0556fc1c17a161974ba4d90863e4a6f5ad37e350abf3</originalsourceid><addsrcrecordid>eNqNUMluGzEMVbqhbuJzrnPJUS011DI6Jk43wGgPadDeBM5YE6txNI4ko3C_vrKTDygvBMi3kI-xcwHvBYD5gFxJ4FIBWN46c8Lm1nRYZ8dR-4LNhBaCI0r7kr07LkCi-vWKzQCh5dZIfMNmUhvTdUbat2ye82-ohaAE2hn7drXLe_6TQmmuKKXgU3Ozj8M6TTH8pRKm2NzmEO-a65BLCv2u-FWzmHax-JSbP6Gsm-U00Ka58TFP6Yy9HmmT_fy5n7LbTx9_LL7w5ffPXxeXS75FsIWTGmyv274jUihaVJ2yHpTS4yAGYah-VU_vSa4sdBq9JD0qWqHxqID6EU_ZxZPulnJ1HxPFIWS3TeGB0t4Jpa0wwlYcf8Lluop3Prl-mu6zE-AOATt0NTN3TNPVgCu-fdZN0-PO5-L8gTD4WBJthjVtD3873YEGA85qZ_F_SQiolVYHb9viP1ijicc</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype><pqid>EBC3036565_10_92</pqid></control><display><type>book_chapter</type><title>Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor</title><source>Springer Books</source><creator>Zhang, Guansong ; Martínez, Francisco ; Tal, Arie ; Blainey, Bob</creator><contributor>Voss, Michael J ; Voss, Michael J.</contributor><creatorcontrib>Zhang, Guansong ; Martínez, Francisco ; Tal, Arie ; Blainey, Bob ; Voss, Michael J ; Voss, Michael J.</creatorcontrib><description>Barrier synchronization is an important and performance critical primitive in many parallel programming models, including the popular OpenMP model. In this paper, we compare the performance of several software implementations of barrier synchronization and introduce a new implementation, distributed counters with local sensor, which considerably reduces overhead on POWER3 and POWER4 SMP systems. Through experiments with the EPCC OpenMP benchmark, we demonstrate a 79% reduction in overhead on a 32-way POWER4 system and an 87% reduction in overhead on a 16-way POWER3 system when comparing with a fetch-and-add implementation. Since these improvements are primarily attributed to reduced L2 and L3 cache misses, we expect the relative performance of our implementation to increase with the number of processors in an SMP and as memory latencies lengthen relative to cache latencies.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 354040435X</identifier><identifier>ISBN: 9783540404354</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 9783540450092</identifier><identifier>EISBN: 3540450092</identifier><identifier>DOI: 10.1007/3-540-45009-2_7</identifier><identifier>OCLC: 467788749</identifier><identifier>OCLC: 52371088</identifier><identifier>LCCallNum: QA76.642 -- .I589 2003eb</identifier><language>eng</language><publisher>Germany: Springer Berlin / Heidelberg</publisher><subject>Applied sciences ; Artificial intelligence ; Barrier ; Computer science; control theory; systems ; distributed counter ; Exact sciences and technology ; multiprocessor ; Pattern recognition. Digital image processing. Computational geometry ; Software ; Speech and sound recognition and synthesis. Linguistics ; synchronization</subject><ispartof>Lecture notes in computer science, 2003, Vol.2716, p.84-98</ispartof><rights>Springer-Verlag Berlin Heidelberg 2003</rights><rights>2004 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/3036565-l.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/3-540-45009-2_7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/3-540-45009-2_7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,779,780,784,789,790,793,4050,4051,27925,38255,41442,42511</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=15691719$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Voss, Michael J</contributor><contributor>Voss, Michael J.</contributor><creatorcontrib>Zhang, Guansong</creatorcontrib><creatorcontrib>Martínez, Francisco</creatorcontrib><creatorcontrib>Tal, Arie</creatorcontrib><creatorcontrib>Blainey, Bob</creatorcontrib><title>Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor</title><title>Lecture notes in computer science</title><description>Barrier synchronization is an important and performance critical primitive in many parallel programming models, including the popular OpenMP model. In this paper, we compare the performance of several software implementations of barrier synchronization and introduce a new implementation, distributed counters with local sensor, which considerably reduces overhead on POWER3 and POWER4 SMP systems. Through experiments with the EPCC OpenMP benchmark, we demonstrate a 79% reduction in overhead on a 32-way POWER4 system and an 87% reduction in overhead on a 16-way POWER3 system when comparing with a fetch-and-add implementation. Since these improvements are primarily attributed to reduced L2 and L3 cache misses, we expect the relative performance of our implementation to increase with the number of processors in an SMP and as memory latencies lengthen relative to cache latencies.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Barrier</subject><subject>Computer science; control theory; systems</subject><subject>distributed counter</subject><subject>Exact sciences and technology</subject><subject>multiprocessor</subject><subject>Pattern recognition. Digital image processing. Computational geometry</subject><subject>Software</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>synchronization</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>354040435X</isbn><isbn>9783540404354</isbn><isbn>9783540450092</isbn><isbn>3540450092</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2003</creationdate><recordtype>book_chapter</recordtype><recordid>eNqNUMluGzEMVbqhbuJzrnPJUS011DI6Jk43wGgPadDeBM5YE6txNI4ko3C_vrKTDygvBMi3kI-xcwHvBYD5gFxJ4FIBWN46c8Lm1nRYZ8dR-4LNhBaCI0r7kr07LkCi-vWKzQCh5dZIfMNmUhvTdUbat2ye82-ohaAE2hn7drXLe_6TQmmuKKXgU3Ozj8M6TTH8pRKm2NzmEO-a65BLCv2u-FWzmHax-JSbP6Gsm-U00Ka58TFP6Yy9HmmT_fy5n7LbTx9_LL7w5ffPXxeXS75FsIWTGmyv274jUihaVJ2yHpTS4yAGYah-VU_vSa4sdBq9JD0qWqHxqID6EU_ZxZPulnJ1HxPFIWS3TeGB0t4Jpa0wwlYcf8Lluop3Prl-mu6zE-AOATt0NTN3TNPVgCu-fdZN0-PO5-L8gTD4WBJthjVtD3873YEGA85qZ_F_SQiolVYHb9viP1ijicc</recordid><startdate>2003</startdate><enddate>2003</enddate><creator>Zhang, Guansong</creator><creator>Martínez, Francisco</creator><creator>Tal, Arie</creator><creator>Blainey, Bob</creator><general>Springer Berlin / Heidelberg</general><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>FFUUA</scope><scope>IQODW</scope></search><sort><creationdate>2003</creationdate><title>Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor</title><author>Zhang, Guansong ; Martínez, Francisco ; Tal, Arie ; Blainey, Bob</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p309t-a5c9b62b8aa531235859e0556fc1c17a161974ba4d90863e4a6f5ad37e350abf3</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Barrier</topic><topic>Computer science; control theory; systems</topic><topic>distributed counter</topic><topic>Exact sciences and technology</topic><topic>multiprocessor</topic><topic>Pattern recognition. Digital image processing. Computational geometry</topic><topic>Software</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>synchronization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Guansong</creatorcontrib><creatorcontrib>Martínez, Francisco</creatorcontrib><creatorcontrib>Tal, Arie</creatorcontrib><creatorcontrib>Blainey, Bob</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Guansong</au><au>Martínez, Francisco</au><au>Tal, Arie</au><au>Blainey, Bob</au><au>Voss, Michael J</au><au>Voss, Michael J.</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor</atitle><btitle>Lecture notes in computer science</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2003</date><risdate>2003</risdate><volume>2716</volume><spage>84</spage><epage>98</epage><pages>84-98</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>354040435X</isbn><isbn>9783540404354</isbn><eisbn>9783540450092</eisbn><eisbn>3540450092</eisbn><abstract>Barrier synchronization is an important and performance critical primitive in many parallel programming models, including the popular OpenMP model. In this paper, we compare the performance of several software implementations of barrier synchronization and introduce a new implementation, distributed counters with local sensor, which considerably reduces overhead on POWER3 and POWER4 SMP systems. Through experiments with the EPCC OpenMP benchmark, we demonstrate a 79% reduction in overhead on a 32-way POWER4 system and an 87% reduction in overhead on a 16-way POWER3 system when comparing with a fetch-and-add implementation. Since these improvements are primarily attributed to reduced L2 and L3 cache misses, we expect the relative performance of our implementation to increase with the number of processors in an SMP and as memory latencies lengthen relative to cache latencies.</abstract><cop>Germany</cop><pub>Springer Berlin / Heidelberg</pub><doi>10.1007/3-540-45009-2_7</doi><oclcid>467788749</oclcid><oclcid>52371088</oclcid><tpages>15</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0302-9743 |
ispartof | Lecture notes in computer science, 2003, Vol.2716, p.84-98 |
issn | 0302-9743 1611-3349 |
language | eng |
recordid | cdi_pascalfrancis_primary_15691719 |
source | Springer Books |
subjects | Applied sciences Artificial intelligence Barrier Computer science control theory systems distributed counter Exact sciences and technology multiprocessor Pattern recognition. Digital image processing. Computational geometry Software Speech and sound recognition and synthesis. Linguistics synchronization |
title | Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T14%3A35%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=Busy-Wait%20Barrier%20Synchronization%20Using%20Distributed%20Counters%20with%20Local%20Sensor&rft.btitle=Lecture%20notes%20in%20computer%20science&rft.au=Zhang,%20Guansong&rft.date=2003&rft.volume=2716&rft.spage=84&rft.epage=98&rft.pages=84-98&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=354040435X&rft.isbn_list=9783540404354&rft_id=info:doi/10.1007/3-540-45009-2_7&rft_dat=%3Cproquest_pasca%3EEBC3036565_10_92%3C/proquest_pasca%3E%3Curl%3E%3C/url%3E&rft.eisbn=9783540450092&rft.eisbn_list=3540450092&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC3036565_10_92&rft_id=info:pmid/&rfr_iscdi=true |