TLSync: support for multiple fast barriers using on-chip transmission lines
As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round and a larger number of rounds. Hardware barrie...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 116 |
---|---|
container_issue | |
container_start_page | 105 |
container_title | |
container_volume | |
creator | Oh, Jungju Prvulovic, Milos Zajic, Alenka |
description | As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round and a larger number of rounds. Hardware barrier implementations require significant dedicated wiring, e.g., using a reduction (arrival) tree and a notification (release) tree, and multiple instances of this wiring are needed to support multiple barriers (e.g., when concurrently executing multiple parallel applications).
This paper presents TLSync, a novel hardware barrier implementation that uses the high-frequency part of the spectrum in a transmission-line broadcast network, thus leaving the transmission line network free for non-modulated (baseband) data transmission. In contrast to other implementations of hardware barriers, TLSync allows multiple thread groups to each have its own barrier. This is accomplished by allocating different bands in the radio-frequency spectrum to different groups. Our circuit-level and electromagnetic models show that the worst-case latency for a TLSync barrier is 4ns to 10ns, depending on the size of the frequency band allocated to each group, and our cycle-accurate architectural simulations show that low-latency TLSync barriers provide significant performance and scalability benefits to barrier-intensive applications. |
doi_str_mv | 10.1145/2000064.2000078 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>acm_6IE</sourceid><recordid>TN_cdi_acm_books_10_1145_2000064_2000078</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6306757</ieee_id><sourcerecordid>acm_books_10_1145_2000064_2000078</sourcerecordid><originalsourceid>FETCH-LOGICAL-a247t-d57565d14baf7c124c62754b0da3e5f2ab4fb10134796a3c6e4b4e25b74684a73</originalsourceid><addsrcrecordid>eNqNkLtOw0AQRYensIJrCn6AZs0-ZmfsEkW8JEsUBIlutbteSwZCkJ0mf4-duKJimlucqzvSAbhSslAK7a2W4xEW--TyCPKKyxFII5E1HUOmLVvByryf_GGnkClJRlBZ8QXkw_AxbVSMaHUG56v6dfcdL-Gs9V9DyudcwNvD_Wr5JOqXx-flXS28Rt6KZnxCtlEYfMtRaYyk2WKQjTfJttoHbIOSyiBX5E2khAGTtoGRSvRsFnB92O1SSu6n79a-3zkykthO9OZAfVy7sNl8Dk5JNwlwswA3CxirxT-rLvRdas0vxNVQRA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>TLSync: support for multiple fast barriers using on-chip transmission lines</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Oh, Jungju ; Prvulovic, Milos ; Zajic, Alenka</creator><creatorcontrib>Oh, Jungju ; Prvulovic, Milos ; Zajic, Alenka</creatorcontrib><description>As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round and a larger number of rounds. Hardware barrier implementations require significant dedicated wiring, e.g., using a reduction (arrival) tree and a notification (release) tree, and multiple instances of this wiring are needed to support multiple barriers (e.g., when concurrently executing multiple parallel applications).
This paper presents TLSync, a novel hardware barrier implementation that uses the high-frequency part of the spectrum in a transmission-line broadcast network, thus leaving the transmission line network free for non-modulated (baseband) data transmission. In contrast to other implementations of hardware barriers, TLSync allows multiple thread groups to each have its own barrier. This is accomplished by allocating different bands in the radio-frequency spectrum to different groups. Our circuit-level and electromagnetic models show that the worst-case latency for a TLSync barrier is 4ns to 10ns, depending on the size of the frequency band allocated to each group, and our cycle-accurate architectural simulations show that low-latency TLSync barriers provide significant performance and scalability benefits to barrier-intensive applications.</description><identifier>ISSN: 1063-6897</identifier><identifier>ISBN: 9781450304726</identifier><identifier>ISBN: 1450304729</identifier><identifier>EISSN: 2575-713X</identifier><identifier>EISBN: 9781450304726</identifier><identifier>EISBN: 1450304729</identifier><identifier>DOI: 10.1145/2000064.2000078</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computer systems organization -- Architectures -- Parallel architectures -- Multiple instruction, multiple data ; Delay ; Frequency modulation ; Power transmission lines ; Receivers ; Synchronization ; Transmitters ; Wires</subject><ispartof>2011 38th Annual International Symposium on Computer Architecture (ISCA), 2011, p.105-116</ispartof><rights>2011 ACM</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6306757$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6306757$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Oh, Jungju</creatorcontrib><creatorcontrib>Prvulovic, Milos</creatorcontrib><creatorcontrib>Zajic, Alenka</creatorcontrib><title>TLSync: support for multiple fast barriers using on-chip transmission lines</title><title>2011 38th Annual International Symposium on Computer Architecture (ISCA)</title><addtitle>ISCA</addtitle><description>As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round and a larger number of rounds. Hardware barrier implementations require significant dedicated wiring, e.g., using a reduction (arrival) tree and a notification (release) tree, and multiple instances of this wiring are needed to support multiple barriers (e.g., when concurrently executing multiple parallel applications).
This paper presents TLSync, a novel hardware barrier implementation that uses the high-frequency part of the spectrum in a transmission-line broadcast network, thus leaving the transmission line network free for non-modulated (baseband) data transmission. In contrast to other implementations of hardware barriers, TLSync allows multiple thread groups to each have its own barrier. This is accomplished by allocating different bands in the radio-frequency spectrum to different groups. Our circuit-level and electromagnetic models show that the worst-case latency for a TLSync barrier is 4ns to 10ns, depending on the size of the frequency band allocated to each group, and our cycle-accurate architectural simulations show that low-latency TLSync barriers provide significant performance and scalability benefits to barrier-intensive applications.</description><subject>Computer systems organization -- Architectures -- Parallel architectures -- Multiple instruction, multiple data</subject><subject>Delay</subject><subject>Frequency modulation</subject><subject>Power transmission lines</subject><subject>Receivers</subject><subject>Synchronization</subject><subject>Transmitters</subject><subject>Wires</subject><issn>1063-6897</issn><issn>2575-713X</issn><isbn>9781450304726</isbn><isbn>1450304729</isbn><isbn>9781450304726</isbn><isbn>1450304729</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNqNkLtOw0AQRYensIJrCn6AZs0-ZmfsEkW8JEsUBIlutbteSwZCkJ0mf4-duKJimlucqzvSAbhSslAK7a2W4xEW--TyCPKKyxFII5E1HUOmLVvByryf_GGnkClJRlBZ8QXkw_AxbVSMaHUG56v6dfcdL-Gs9V9DyudcwNvD_Wr5JOqXx-flXS28Rt6KZnxCtlEYfMtRaYyk2WKQjTfJttoHbIOSyiBX5E2khAGTtoGRSvRsFnB92O1SSu6n79a-3zkykthO9OZAfVy7sNl8Dk5JNwlwswA3CxirxT-rLvRdas0vxNVQRA</recordid><startdate>20110604</startdate><enddate>20110604</enddate><creator>Oh, Jungju</creator><creator>Prvulovic, Milos</creator><creator>Zajic, Alenka</creator><general>ACM</general><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20110604</creationdate><title>TLSync</title><author>Oh, Jungju ; Prvulovic, Milos ; Zajic, Alenka</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a247t-d57565d14baf7c124c62754b0da3e5f2ab4fb10134796a3c6e4b4e25b74684a73</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Computer systems organization -- Architectures -- Parallel architectures -- Multiple instruction, multiple data</topic><topic>Delay</topic><topic>Frequency modulation</topic><topic>Power transmission lines</topic><topic>Receivers</topic><topic>Synchronization</topic><topic>Transmitters</topic><topic>Wires</topic><toplevel>online_resources</toplevel><creatorcontrib>Oh, Jungju</creatorcontrib><creatorcontrib>Prvulovic, Milos</creatorcontrib><creatorcontrib>Zajic, Alenka</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Oh, Jungju</au><au>Prvulovic, Milos</au><au>Zajic, Alenka</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>TLSync: support for multiple fast barriers using on-chip transmission lines</atitle><btitle>2011 38th Annual International Symposium on Computer Architecture (ISCA)</btitle><stitle>ISCA</stitle><date>2011-06-04</date><risdate>2011</risdate><spage>105</spage><epage>116</epage><pages>105-116</pages><issn>1063-6897</issn><eissn>2575-713X</eissn><isbn>9781450304726</isbn><isbn>1450304729</isbn><eisbn>9781450304726</eisbn><eisbn>1450304729</eisbn><abstract>As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round and a larger number of rounds. Hardware barrier implementations require significant dedicated wiring, e.g., using a reduction (arrival) tree and a notification (release) tree, and multiple instances of this wiring are needed to support multiple barriers (e.g., when concurrently executing multiple parallel applications).
This paper presents TLSync, a novel hardware barrier implementation that uses the high-frequency part of the spectrum in a transmission-line broadcast network, thus leaving the transmission line network free for non-modulated (baseband) data transmission. In contrast to other implementations of hardware barriers, TLSync allows multiple thread groups to each have its own barrier. This is accomplished by allocating different bands in the radio-frequency spectrum to different groups. Our circuit-level and electromagnetic models show that the worst-case latency for a TLSync barrier is 4ns to 10ns, depending on the size of the frequency band allocated to each group, and our cycle-accurate architectural simulations show that low-latency TLSync barriers provide significant performance and scalability benefits to barrier-intensive applications.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/2000064.2000078</doi><tpages>12</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1063-6897 |
ispartof | 2011 38th Annual International Symposium on Computer Architecture (ISCA), 2011, p.105-116 |
issn | 1063-6897 2575-713X |
language | eng |
recordid | cdi_acm_books_10_1145_2000064_2000078 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Computer systems organization -- Architectures -- Parallel architectures -- Multiple instruction, multiple data Delay Frequency modulation Power transmission lines Receivers Synchronization Transmitters Wires |
title | TLSync: support for multiple fast barriers using on-chip transmission lines |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T15%3A04%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=TLSync:%20support%20for%20multiple%20fast%20barriers%20using%20on-chip%20transmission%20lines&rft.btitle=2011%2038th%20Annual%20International%20Symposium%20on%20Computer%20Architecture%20(ISCA)&rft.au=Oh,%20Jungju&rft.date=2011-06-04&rft.spage=105&rft.epage=116&rft.pages=105-116&rft.issn=1063-6897&rft.eissn=2575-713X&rft.isbn=9781450304726&rft.isbn_list=1450304729&rft_id=info:doi/10.1145/2000064.2000078&rft_dat=%3Cacm_6IE%3Eacm_books_10_1145_2000064_2000078%3C/acm_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781450304726&rft.eisbn_list=1450304729&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6306757&rfr_iscdi=true |