Evaluation of a multithreaded architecture for cellular computing
Cyclops is a new architecture for high-performance parallel computers that is being developed at the IBM T. J. Watson Research Center. The basic cell of this architecture is a single-chip SMP (symmetric multiprocessor) system with multiple threads of execution, embedded memory and integrated communi...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 321 |
---|---|
container_issue | |
container_start_page | 311 |
container_title | |
container_volume | |
creator | Cascaval, C. Castanos, J.G. Ceze, L. Denneau, M. Gupta, M. Lieber, D. Moreira, J.E. Strauss, K. Warren, H.S. |
description | Cyclops is a new architecture for high-performance parallel computers that is being developed at the IBM T. J. Watson Research Center. The basic cell of this architecture is a single-chip SMP (symmetric multiprocessor) system with multiple threads of execution, embedded memory and integrated communications hardware. Massive intra-chip parallelism is used to tolerate memory and functional unit latencies. Large systems with thousands of chips can be built by replicating this basic cell in a regular pattern. In this paper, we describe the Cyclops architecture and evaluate two of its new hardware features: a memory hierarchy with a flexible cache organization and fast barrier hardware. Our experiments with the STREAM benchmark show that a particular design can achieve a sustainable memory bandwidth of 40 GB/s, equal to the peak hardware bandwidth and similar to the performance of a 128-processor SGI Origin 3800. For small vectors, we have observed in-cache bandwidth above 80 GB/s. We also show that the fast barrier hardware can improve the performance of the Splash-2 FFT kernel by up to 10%. Our results demonstrate that the Cyclops approach of integrating a large number of simple processing elements and multiple memory banks in the same chip is an effective alternative for designing high-performance systems. |
doi_str_mv | 10.1109/HPCA.2002.995720 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_995720</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>995720</ieee_id><sourcerecordid>995720</sourcerecordid><originalsourceid>FETCH-LOGICAL-i174t-984c54a2b8f3b14c00ddfe893275a0d2bf560552efdd6aa8cc4c94f93c216c073</originalsourceid><addsrcrecordid>eNotj0tLxDAYRYMPsI6zF1f5A61fXk2yLGV0hAFdKLgb0jycSDsd0lTw31sY7-YeuHDgInRPoCIE9OP2rW0qCkArrYWkcIEKyqQqKbDPS3QLstaCCCrUFSqIYFCC0vIGrafpG5bwZeSkQM3mx_SzyXE84jFgg4e5zzEfkjfOO2ySPcTsbZ6Tx2FM2Pq-n3uzwDic5hyPX3foOph-8uv_XqGPp817uy13r88vbbMrI5E8l1pxK7ihnQqsI9wCOBe80oxKYcDRLogahKA-OFcbo6zlVvOgmaWktiDZCj2cvdF7vz-lOJj0uz-fZ388GUxb</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Evaluation of a multithreaded architecture for cellular computing</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Cascaval, C. ; Castanos, J.G. ; Ceze, L. ; Denneau, M. ; Gupta, M. ; Lieber, D. ; Moreira, J.E. ; Strauss, K. ; Warren, H.S.</creator><creatorcontrib>Cascaval, C. ; Castanos, J.G. ; Ceze, L. ; Denneau, M. ; Gupta, M. ; Lieber, D. ; Moreira, J.E. ; Strauss, K. ; Warren, H.S.</creatorcontrib><description>Cyclops is a new architecture for high-performance parallel computers that is being developed at the IBM T. J. Watson Research Center. The basic cell of this architecture is a single-chip SMP (symmetric multiprocessor) system with multiple threads of execution, embedded memory and integrated communications hardware. Massive intra-chip parallelism is used to tolerate memory and functional unit latencies. Large systems with thousands of chips can be built by replicating this basic cell in a regular pattern. In this paper, we describe the Cyclops architecture and evaluate two of its new hardware features: a memory hierarchy with a flexible cache organization and fast barrier hardware. Our experiments with the STREAM benchmark show that a particular design can achieve a sustainable memory bandwidth of 40 GB/s, equal to the peak hardware bandwidth and similar to the performance of a 128-processor SGI Origin 3800. For small vectors, we have observed in-cache bandwidth above 80 GB/s. We also show that the fast barrier hardware can improve the performance of the Splash-2 FFT kernel by up to 10%. Our results demonstrate that the Cyclops approach of integrating a large number of simple processing elements and multiple memory banks in the same chip is an effective alternative for designing high-performance systems.</description><identifier>ISSN: 1530-0897</identifier><identifier>ISBN: 0769515258</identifier><identifier>ISBN: 9780769515250</identifier><identifier>EISSN: 2378-203X</identifier><identifier>DOI: 10.1109/HPCA.2002.995720</identifier><language>eng</language><publisher>IEEE</publisher><subject>Bandwidth ; Computer architecture ; Concurrent computing ; Delay ; Hardware ; High performance computing ; Logic ; Parallel processing ; Silicon ; Yarn</subject><ispartof>Proceedings Eighth International Symposium on High Performance Computer Architecture, 2002, p.311-321</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/995720$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2051,4035,4036,27904,54899</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/995720$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Cascaval, C.</creatorcontrib><creatorcontrib>Castanos, J.G.</creatorcontrib><creatorcontrib>Ceze, L.</creatorcontrib><creatorcontrib>Denneau, M.</creatorcontrib><creatorcontrib>Gupta, M.</creatorcontrib><creatorcontrib>Lieber, D.</creatorcontrib><creatorcontrib>Moreira, J.E.</creatorcontrib><creatorcontrib>Strauss, K.</creatorcontrib><creatorcontrib>Warren, H.S.</creatorcontrib><title>Evaluation of a multithreaded architecture for cellular computing</title><title>Proceedings Eighth International Symposium on High Performance Computer Architecture</title><addtitle>HPCA</addtitle><description>Cyclops is a new architecture for high-performance parallel computers that is being developed at the IBM T. J. Watson Research Center. The basic cell of this architecture is a single-chip SMP (symmetric multiprocessor) system with multiple threads of execution, embedded memory and integrated communications hardware. Massive intra-chip parallelism is used to tolerate memory and functional unit latencies. Large systems with thousands of chips can be built by replicating this basic cell in a regular pattern. In this paper, we describe the Cyclops architecture and evaluate two of its new hardware features: a memory hierarchy with a flexible cache organization and fast barrier hardware. Our experiments with the STREAM benchmark show that a particular design can achieve a sustainable memory bandwidth of 40 GB/s, equal to the peak hardware bandwidth and similar to the performance of a 128-processor SGI Origin 3800. For small vectors, we have observed in-cache bandwidth above 80 GB/s. We also show that the fast barrier hardware can improve the performance of the Splash-2 FFT kernel by up to 10%. Our results demonstrate that the Cyclops approach of integrating a large number of simple processing elements and multiple memory banks in the same chip is an effective alternative for designing high-performance systems.</description><subject>Bandwidth</subject><subject>Computer architecture</subject><subject>Concurrent computing</subject><subject>Delay</subject><subject>Hardware</subject><subject>High performance computing</subject><subject>Logic</subject><subject>Parallel processing</subject><subject>Silicon</subject><subject>Yarn</subject><issn>1530-0897</issn><issn>2378-203X</issn><isbn>0769515258</isbn><isbn>9780769515250</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2002</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotj0tLxDAYRYMPsI6zF1f5A61fXk2yLGV0hAFdKLgb0jycSDsd0lTw31sY7-YeuHDgInRPoCIE9OP2rW0qCkArrYWkcIEKyqQqKbDPS3QLstaCCCrUFSqIYFCC0vIGrafpG5bwZeSkQM3mx_SzyXE84jFgg4e5zzEfkjfOO2ySPcTsbZ6Tx2FM2Pq-n3uzwDic5hyPX3foOph-8uv_XqGPp817uy13r88vbbMrI5E8l1pxK7ihnQqsI9wCOBe80oxKYcDRLogahKA-OFcbo6zlVvOgmaWktiDZCj2cvdF7vz-lOJj0uz-fZ388GUxb</recordid><startdate>2002</startdate><enddate>2002</enddate><creator>Cascaval, C.</creator><creator>Castanos, J.G.</creator><creator>Ceze, L.</creator><creator>Denneau, M.</creator><creator>Gupta, M.</creator><creator>Lieber, D.</creator><creator>Moreira, J.E.</creator><creator>Strauss, K.</creator><creator>Warren, H.S.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>2002</creationdate><title>Evaluation of a multithreaded architecture for cellular computing</title><author>Cascaval, C. ; Castanos, J.G. ; Ceze, L. ; Denneau, M. ; Gupta, M. ; Lieber, D. ; Moreira, J.E. ; Strauss, K. ; Warren, H.S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i174t-984c54a2b8f3b14c00ddfe893275a0d2bf560552efdd6aa8cc4c94f93c216c073</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Bandwidth</topic><topic>Computer architecture</topic><topic>Concurrent computing</topic><topic>Delay</topic><topic>Hardware</topic><topic>High performance computing</topic><topic>Logic</topic><topic>Parallel processing</topic><topic>Silicon</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Cascaval, C.</creatorcontrib><creatorcontrib>Castanos, J.G.</creatorcontrib><creatorcontrib>Ceze, L.</creatorcontrib><creatorcontrib>Denneau, M.</creatorcontrib><creatorcontrib>Gupta, M.</creatorcontrib><creatorcontrib>Lieber, D.</creatorcontrib><creatorcontrib>Moreira, J.E.</creatorcontrib><creatorcontrib>Strauss, K.</creatorcontrib><creatorcontrib>Warren, H.S.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cascaval, C.</au><au>Castanos, J.G.</au><au>Ceze, L.</au><au>Denneau, M.</au><au>Gupta, M.</au><au>Lieber, D.</au><au>Moreira, J.E.</au><au>Strauss, K.</au><au>Warren, H.S.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Evaluation of a multithreaded architecture for cellular computing</atitle><btitle>Proceedings Eighth International Symposium on High Performance Computer Architecture</btitle><stitle>HPCA</stitle><date>2002</date><risdate>2002</risdate><spage>311</spage><epage>321</epage><pages>311-321</pages><issn>1530-0897</issn><eissn>2378-203X</eissn><isbn>0769515258</isbn><isbn>9780769515250</isbn><abstract>Cyclops is a new architecture for high-performance parallel computers that is being developed at the IBM T. J. Watson Research Center. The basic cell of this architecture is a single-chip SMP (symmetric multiprocessor) system with multiple threads of execution, embedded memory and integrated communications hardware. Massive intra-chip parallelism is used to tolerate memory and functional unit latencies. Large systems with thousands of chips can be built by replicating this basic cell in a regular pattern. In this paper, we describe the Cyclops architecture and evaluate two of its new hardware features: a memory hierarchy with a flexible cache organization and fast barrier hardware. Our experiments with the STREAM benchmark show that a particular design can achieve a sustainable memory bandwidth of 40 GB/s, equal to the peak hardware bandwidth and similar to the performance of a 128-processor SGI Origin 3800. For small vectors, we have observed in-cache bandwidth above 80 GB/s. We also show that the fast barrier hardware can improve the performance of the Splash-2 FFT kernel by up to 10%. Our results demonstrate that the Cyclops approach of integrating a large number of simple processing elements and multiple memory banks in the same chip is an effective alternative for designing high-performance systems.</abstract><pub>IEEE</pub><doi>10.1109/HPCA.2002.995720</doi><tpages>11</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1530-0897 |
ispartof | Proceedings Eighth International Symposium on High Performance Computer Architecture, 2002, p.311-321 |
issn | 1530-0897 2378-203X |
language | eng |
recordid | cdi_ieee_primary_995720 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Bandwidth Computer architecture Concurrent computing Delay Hardware High performance computing Logic Parallel processing Silicon Yarn |
title | Evaluation of a multithreaded architecture for cellular computing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T15%3A33%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Evaluation%20of%20a%20multithreaded%20architecture%20for%20cellular%20computing&rft.btitle=Proceedings%20Eighth%20International%20Symposium%20on%20High%20Performance%20Computer%20Architecture&rft.au=Cascaval,%20C.&rft.date=2002&rft.spage=311&rft.epage=321&rft.pages=311-321&rft.issn=1530-0897&rft.eissn=2378-203X&rft.isbn=0769515258&rft.isbn_list=9780769515250&rft_id=info:doi/10.1109/HPCA.2002.995720&rft_dat=%3Cieee_6IE%3E995720%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=995720&rfr_iscdi=true |