TLSync: support for multiple fast barriers using on-chip transmission lines

As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round and a larger number of rounds. Hardware barrie...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Oh, Jungju, Prvulovic, Milos, Zajic, Alenka
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 116
container_issue
container_start_page 105
container_title
container_volume
creator Oh, Jungju
Prvulovic, Milos
Zajic, Alenka
description As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round and a larger number of rounds. Hardware barrier implementations require significant dedicated wiring, e.g., using a reduction (arrival) tree and a notification (release) tree, and multiple instances of this wiring are needed to support multiple barriers (e.g., when concurrently executing multiple parallel applications). This paper presents TLSync, a novel hardware barrier implementation that uses the high-frequency part of the spectrum in a transmission-line broadcast network, thus leaving the transmission line network free for non-modulated (baseband) data transmission. In contrast to other implementations of hardware barriers, TLSync allows multiple thread groups to each have its own barrier. This is accomplished by allocating different bands in the radio-frequency spectrum to different groups. Our circuit-level and electromagnetic models show that the worst-case latency for a TLSync barrier is 4ns to 10ns, depending on the size of the frequency band allocated to each group, and our cycle-accurate architectural simulations show that low-latency TLSync barriers provide significant performance and scalability benefits to barrier-intensive applications.
doi_str_mv 10.1145/2000064.2000078
format Conference Proceeding
fullrecord <record><control><sourceid>acm_6IE</sourceid><recordid>TN_cdi_acm_books_10_1145_2000064_2000078</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6306757</ieee_id><sourcerecordid>acm_books_10_1145_2000064_2000078</sourcerecordid><originalsourceid>FETCH-LOGICAL-a247t-d57565d14baf7c124c62754b0da3e5f2ab4fb10134796a3c6e4b4e25b74684a73</originalsourceid><addsrcrecordid>eNqNkLtOw0AQRYensIJrCn6AZs0-ZmfsEkW8JEsUBIlutbteSwZCkJ0mf4-duKJimlucqzvSAbhSslAK7a2W4xEW--TyCPKKyxFII5E1HUOmLVvByryf_GGnkClJRlBZ8QXkw_AxbVSMaHUG56v6dfcdL-Gs9V9DyudcwNvD_Wr5JOqXx-flXS28Rt6KZnxCtlEYfMtRaYyk2WKQjTfJttoHbIOSyiBX5E2khAGTtoGRSvRsFnB92O1SSu6n79a-3zkykthO9OZAfVy7sNl8Dk5JNwlwswA3CxirxT-rLvRdas0vxNVQRA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>TLSync: support for multiple fast barriers using on-chip transmission lines</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Oh, Jungju ; Prvulovic, Milos ; Zajic, Alenka</creator><creatorcontrib>Oh, Jungju ; Prvulovic, Milos ; Zajic, Alenka</creatorcontrib><description>As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round and a larger number of rounds. Hardware barrier implementations require significant dedicated wiring, e.g., using a reduction (arrival) tree and a notification (release) tree, and multiple instances of this wiring are needed to support multiple barriers (e.g., when concurrently executing multiple parallel applications). This paper presents TLSync, a novel hardware barrier implementation that uses the high-frequency part of the spectrum in a transmission-line broadcast network, thus leaving the transmission line network free for non-modulated (baseband) data transmission. In contrast to other implementations of hardware barriers, TLSync allows multiple thread groups to each have its own barrier. This is accomplished by allocating different bands in the radio-frequency spectrum to different groups. Our circuit-level and electromagnetic models show that the worst-case latency for a TLSync barrier is 4ns to 10ns, depending on the size of the frequency band allocated to each group, and our cycle-accurate architectural simulations show that low-latency TLSync barriers provide significant performance and scalability benefits to barrier-intensive applications.</description><identifier>ISSN: 1063-6897</identifier><identifier>ISBN: 9781450304726</identifier><identifier>ISBN: 1450304729</identifier><identifier>EISSN: 2575-713X</identifier><identifier>EISBN: 9781450304726</identifier><identifier>EISBN: 1450304729</identifier><identifier>DOI: 10.1145/2000064.2000078</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computer systems organization -- Architectures -- Parallel architectures -- Multiple instruction, multiple data ; Delay ; Frequency modulation ; Power transmission lines ; Receivers ; Synchronization ; Transmitters ; Wires</subject><ispartof>2011 38th Annual International Symposium on Computer Architecture (ISCA), 2011, p.105-116</ispartof><rights>2011 ACM</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6306757$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6306757$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Oh, Jungju</creatorcontrib><creatorcontrib>Prvulovic, Milos</creatorcontrib><creatorcontrib>Zajic, Alenka</creatorcontrib><title>TLSync: support for multiple fast barriers using on-chip transmission lines</title><title>2011 38th Annual International Symposium on Computer Architecture (ISCA)</title><addtitle>ISCA</addtitle><description>As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round and a larger number of rounds. Hardware barrier implementations require significant dedicated wiring, e.g., using a reduction (arrival) tree and a notification (release) tree, and multiple instances of this wiring are needed to support multiple barriers (e.g., when concurrently executing multiple parallel applications). This paper presents TLSync, a novel hardware barrier implementation that uses the high-frequency part of the spectrum in a transmission-line broadcast network, thus leaving the transmission line network free for non-modulated (baseband) data transmission. In contrast to other implementations of hardware barriers, TLSync allows multiple thread groups to each have its own barrier. This is accomplished by allocating different bands in the radio-frequency spectrum to different groups. Our circuit-level and electromagnetic models show that the worst-case latency for a TLSync barrier is 4ns to 10ns, depending on the size of the frequency band allocated to each group, and our cycle-accurate architectural simulations show that low-latency TLSync barriers provide significant performance and scalability benefits to barrier-intensive applications.</description><subject>Computer systems organization -- Architectures -- Parallel architectures -- Multiple instruction, multiple data</subject><subject>Delay</subject><subject>Frequency modulation</subject><subject>Power transmission lines</subject><subject>Receivers</subject><subject>Synchronization</subject><subject>Transmitters</subject><subject>Wires</subject><issn>1063-6897</issn><issn>2575-713X</issn><isbn>9781450304726</isbn><isbn>1450304729</isbn><isbn>9781450304726</isbn><isbn>1450304729</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNqNkLtOw0AQRYensIJrCn6AZs0-ZmfsEkW8JEsUBIlutbteSwZCkJ0mf4-duKJimlucqzvSAbhSslAK7a2W4xEW--TyCPKKyxFII5E1HUOmLVvByryf_GGnkClJRlBZ8QXkw_AxbVSMaHUG56v6dfcdL-Gs9V9DyudcwNvD_Wr5JOqXx-flXS28Rt6KZnxCtlEYfMtRaYyk2WKQjTfJttoHbIOSyiBX5E2khAGTtoGRSvRsFnB92O1SSu6n79a-3zkykthO9OZAfVy7sNl8Dk5JNwlwswA3CxirxT-rLvRdas0vxNVQRA</recordid><startdate>20110604</startdate><enddate>20110604</enddate><creator>Oh, Jungju</creator><creator>Prvulovic, Milos</creator><creator>Zajic, Alenka</creator><general>ACM</general><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20110604</creationdate><title>TLSync</title><author>Oh, Jungju ; Prvulovic, Milos ; Zajic, Alenka</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a247t-d57565d14baf7c124c62754b0da3e5f2ab4fb10134796a3c6e4b4e25b74684a73</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Computer systems organization -- Architectures -- Parallel architectures -- Multiple instruction, multiple data</topic><topic>Delay</topic><topic>Frequency modulation</topic><topic>Power transmission lines</topic><topic>Receivers</topic><topic>Synchronization</topic><topic>Transmitters</topic><topic>Wires</topic><toplevel>online_resources</toplevel><creatorcontrib>Oh, Jungju</creatorcontrib><creatorcontrib>Prvulovic, Milos</creatorcontrib><creatorcontrib>Zajic, Alenka</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Oh, Jungju</au><au>Prvulovic, Milos</au><au>Zajic, Alenka</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>TLSync: support for multiple fast barriers using on-chip transmission lines</atitle><btitle>2011 38th Annual International Symposium on Computer Architecture (ISCA)</btitle><stitle>ISCA</stitle><date>2011-06-04</date><risdate>2011</risdate><spage>105</spage><epage>116</epage><pages>105-116</pages><issn>1063-6897</issn><eissn>2575-713X</eissn><isbn>9781450304726</isbn><isbn>1450304729</isbn><eisbn>9781450304726</eisbn><eisbn>1450304729</eisbn><abstract>As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round and a larger number of rounds. Hardware barrier implementations require significant dedicated wiring, e.g., using a reduction (arrival) tree and a notification (release) tree, and multiple instances of this wiring are needed to support multiple barriers (e.g., when concurrently executing multiple parallel applications). This paper presents TLSync, a novel hardware barrier implementation that uses the high-frequency part of the spectrum in a transmission-line broadcast network, thus leaving the transmission line network free for non-modulated (baseband) data transmission. In contrast to other implementations of hardware barriers, TLSync allows multiple thread groups to each have its own barrier. This is accomplished by allocating different bands in the radio-frequency spectrum to different groups. Our circuit-level and electromagnetic models show that the worst-case latency for a TLSync barrier is 4ns to 10ns, depending on the size of the frequency band allocated to each group, and our cycle-accurate architectural simulations show that low-latency TLSync barriers provide significant performance and scalability benefits to barrier-intensive applications.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/2000064.2000078</doi><tpages>12</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1063-6897
ispartof 2011 38th Annual International Symposium on Computer Architecture (ISCA), 2011, p.105-116
issn 1063-6897
2575-713X
language eng
recordid cdi_acm_books_10_1145_2000064_2000078
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Computer systems organization -- Architectures -- Parallel architectures -- Multiple instruction, multiple data
Delay
Frequency modulation
Power transmission lines
Receivers
Synchronization
Transmitters
Wires
title TLSync: support for multiple fast barriers using on-chip transmission lines
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T15%3A04%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=TLSync:%20support%20for%20multiple%20fast%20barriers%20using%20on-chip%20transmission%20lines&rft.btitle=2011%2038th%20Annual%20International%20Symposium%20on%20Computer%20Architecture%20(ISCA)&rft.au=Oh,%20Jungju&rft.date=2011-06-04&rft.spage=105&rft.epage=116&rft.pages=105-116&rft.issn=1063-6897&rft.eissn=2575-713X&rft.isbn=9781450304726&rft.isbn_list=1450304729&rft_id=info:doi/10.1145/2000064.2000078&rft_dat=%3Cacm_6IE%3Eacm_books_10_1145_2000064_2000078%3C/acm_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781450304726&rft.eisbn_list=1450304729&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6306757&rfr_iscdi=true