Power-Performance Comparison of Single-Task Driven Many-Cores
Many-cores, processors with 100s of cores, are becoming increasingly popular in general-purpose computing, yet power is a limiting factor in their performance. In this paper, we compare the power and performance of two design points in the many-core processor domain. The XMT general-purpose processo...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 355 |
---|---|
container_issue | |
container_start_page | 348 |
container_title | |
container_volume | |
creator | Keceli, F. Moreshet, T. Vishkin, U. |
description | Many-cores, processors with 100s of cores, are becoming increasingly popular in general-purpose computing, yet power is a limiting factor in their performance. In this paper, we compare the power and performance of two design points in the many-core processor domain. The XMT general-purpose processor provides significant runtime advantage on irregular parallel programs (e.g., graph algorithms). This was previously demonstrated and tied to its architecture choices and ease-of-programming. In contrast, current commercial GPUs excel at regular parallel programs that require high processing capability. In this work, we set the power envelope as a constraint and evaluate an envisioned 1024-core XMT processor against an NVIDIA GTX280 GPU considering various scenarios for estimating the power of the XMT chip. Even under worst-case assumptions and scenarios, simulations show that the XMT processor sustains its advantage over the GPU on irregular parallel programs, while not falling significantly behind on regular programs. The total energy spent per benchmark fits a similar pattern. Given that the two architectures target different types of parallelism, a future system can potentially utilize an XMT chip and a GPU chip in complementary roles. |
doi_str_mv | 10.1109/ICPADS.2011.101 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6121297</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6121297</ieee_id><sourcerecordid>6121297</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-2629874763c3f9ec0e99ae683de062a1e2bd35b8de514d7248b4afe5c628714c3</originalsourceid><addsrcrecordid>eNotjMtKw0AUQMcXmNauXbjJD0y8dybzWrgoqY9CxUDrukySG4k2mTIRS__egm7OgbM4jN0iZIjg7pdFOV-sMwGIGQKesZkzFox2KlcnnrNEaAdcOa0u2ARP0aA1yl6yBJVA7sCZazYZx08AAVJBwh7KcKDIS4ptiL0fakqL0O997MYwpKFN193wsSO-8eNXuojdDw3pqx-OvAiRxht21frdSLN_T9n70-OmeOGrt-dlMV_xDo365kILZ01utKxl66gGcs6TtrIh0MIjiaqRqrINKcwbI3Jb5b4lVWthDea1nLK7v29HRNt97Hofj1uNAoUz8hezgEuN</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Power-Performance Comparison of Single-Task Driven Many-Cores</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Keceli, F. ; Moreshet, T. ; Vishkin, U.</creator><creatorcontrib>Keceli, F. ; Moreshet, T. ; Vishkin, U.</creatorcontrib><description>Many-cores, processors with 100s of cores, are becoming increasingly popular in general-purpose computing, yet power is a limiting factor in their performance. In this paper, we compare the power and performance of two design points in the many-core processor domain. The XMT general-purpose processor provides significant runtime advantage on irregular parallel programs (e.g., graph algorithms). This was previously demonstrated and tied to its architecture choices and ease-of-programming. In contrast, current commercial GPUs excel at regular parallel programs that require high processing capability. In this work, we set the power envelope as a constraint and evaluate an envisioned 1024-core XMT processor against an NVIDIA GTX280 GPU considering various scenarios for estimating the power of the XMT chip. Even under worst-case assumptions and scenarios, simulations show that the XMT processor sustains its advantage over the GPU on irregular parallel programs, while not falling significantly behind on regular programs. The total energy spent per benchmark fits a similar pattern. Given that the two architectures target different types of parallelism, a future system can potentially utilize an XMT chip and a GPU chip in complementary roles.</description><identifier>ISSN: 1521-9097</identifier><identifier>ISBN: 1457718758</identifier><identifier>ISBN: 9781457718755</identifier><identifier>EISSN: 2690-5965</identifier><identifier>EISBN: 9780769545769</identifier><identifier>EISBN: 0769545769</identifier><identifier>DOI: 10.1109/ICPADS.2011.101</identifier><language>eng</language><publisher>IEEE</publisher><subject>Benchmark testing ; Clocks ; Computer architecture ; GPU ; Graphics processing unit ; many-core ; parallelism ; power and performance comparison ; PRAM ; Random access memory ; Temperature measurement ; XMT</subject><ispartof>2011 IEEE 17th International Conference on Parallel and Distributed Systems, 2011, p.348-355</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6121297$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6121297$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Keceli, F.</creatorcontrib><creatorcontrib>Moreshet, T.</creatorcontrib><creatorcontrib>Vishkin, U.</creatorcontrib><title>Power-Performance Comparison of Single-Task Driven Many-Cores</title><title>2011 IEEE 17th International Conference on Parallel and Distributed Systems</title><addtitle>icpads</addtitle><description>Many-cores, processors with 100s of cores, are becoming increasingly popular in general-purpose computing, yet power is a limiting factor in their performance. In this paper, we compare the power and performance of two design points in the many-core processor domain. The XMT general-purpose processor provides significant runtime advantage on irregular parallel programs (e.g., graph algorithms). This was previously demonstrated and tied to its architecture choices and ease-of-programming. In contrast, current commercial GPUs excel at regular parallel programs that require high processing capability. In this work, we set the power envelope as a constraint and evaluate an envisioned 1024-core XMT processor against an NVIDIA GTX280 GPU considering various scenarios for estimating the power of the XMT chip. Even under worst-case assumptions and scenarios, simulations show that the XMT processor sustains its advantage over the GPU on irregular parallel programs, while not falling significantly behind on regular programs. The total energy spent per benchmark fits a similar pattern. Given that the two architectures target different types of parallelism, a future system can potentially utilize an XMT chip and a GPU chip in complementary roles.</description><subject>Benchmark testing</subject><subject>Clocks</subject><subject>Computer architecture</subject><subject>GPU</subject><subject>Graphics processing unit</subject><subject>many-core</subject><subject>parallelism</subject><subject>power and performance comparison</subject><subject>PRAM</subject><subject>Random access memory</subject><subject>Temperature measurement</subject><subject>XMT</subject><issn>1521-9097</issn><issn>2690-5965</issn><isbn>1457718758</isbn><isbn>9781457718755</isbn><isbn>9780769545769</isbn><isbn>0769545769</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjMtKw0AUQMcXmNauXbjJD0y8dybzWrgoqY9CxUDrukySG4k2mTIRS__egm7OgbM4jN0iZIjg7pdFOV-sMwGIGQKesZkzFox2KlcnnrNEaAdcOa0u2ARP0aA1yl6yBJVA7sCZazYZx08AAVJBwh7KcKDIS4ptiL0fakqL0O997MYwpKFN193wsSO-8eNXuojdDw3pqx-OvAiRxht21frdSLN_T9n70-OmeOGrt-dlMV_xDo365kILZ01utKxl66gGcs6TtrIh0MIjiaqRqrINKcwbI3Jb5b4lVWthDea1nLK7v29HRNt97Hofj1uNAoUz8hezgEuN</recordid><startdate>201112</startdate><enddate>201112</enddate><creator>Keceli, F.</creator><creator>Moreshet, T.</creator><creator>Vishkin, U.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201112</creationdate><title>Power-Performance Comparison of Single-Task Driven Many-Cores</title><author>Keceli, F. ; Moreshet, T. ; Vishkin, U.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-2629874763c3f9ec0e99ae683de062a1e2bd35b8de514d7248b4afe5c628714c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Benchmark testing</topic><topic>Clocks</topic><topic>Computer architecture</topic><topic>GPU</topic><topic>Graphics processing unit</topic><topic>many-core</topic><topic>parallelism</topic><topic>power and performance comparison</topic><topic>PRAM</topic><topic>Random access memory</topic><topic>Temperature measurement</topic><topic>XMT</topic><toplevel>online_resources</toplevel><creatorcontrib>Keceli, F.</creatorcontrib><creatorcontrib>Moreshet, T.</creatorcontrib><creatorcontrib>Vishkin, U.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Keceli, F.</au><au>Moreshet, T.</au><au>Vishkin, U.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Power-Performance Comparison of Single-Task Driven Many-Cores</atitle><btitle>2011 IEEE 17th International Conference on Parallel and Distributed Systems</btitle><stitle>icpads</stitle><date>2011-12</date><risdate>2011</risdate><spage>348</spage><epage>355</epage><pages>348-355</pages><issn>1521-9097</issn><eissn>2690-5965</eissn><isbn>1457718758</isbn><isbn>9781457718755</isbn><eisbn>9780769545769</eisbn><eisbn>0769545769</eisbn><abstract>Many-cores, processors with 100s of cores, are becoming increasingly popular in general-purpose computing, yet power is a limiting factor in their performance. In this paper, we compare the power and performance of two design points in the many-core processor domain. The XMT general-purpose processor provides significant runtime advantage on irregular parallel programs (e.g., graph algorithms). This was previously demonstrated and tied to its architecture choices and ease-of-programming. In contrast, current commercial GPUs excel at regular parallel programs that require high processing capability. In this work, we set the power envelope as a constraint and evaluate an envisioned 1024-core XMT processor against an NVIDIA GTX280 GPU considering various scenarios for estimating the power of the XMT chip. Even under worst-case assumptions and scenarios, simulations show that the XMT processor sustains its advantage over the GPU on irregular parallel programs, while not falling significantly behind on regular programs. The total energy spent per benchmark fits a similar pattern. Given that the two architectures target different types of parallelism, a future system can potentially utilize an XMT chip and a GPU chip in complementary roles.</abstract><pub>IEEE</pub><doi>10.1109/ICPADS.2011.101</doi><tpages>8</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1521-9097 |
ispartof | 2011 IEEE 17th International Conference on Parallel and Distributed Systems, 2011, p.348-355 |
issn | 1521-9097 2690-5965 |
language | eng |
recordid | cdi_ieee_primary_6121297 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Benchmark testing Clocks Computer architecture GPU Graphics processing unit many-core parallelism power and performance comparison PRAM Random access memory Temperature measurement XMT |
title | Power-Performance Comparison of Single-Task Driven Many-Cores |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T08%3A10%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Power-Performance%20Comparison%20of%20Single-Task%20Driven%20Many-Cores&rft.btitle=2011%20IEEE%2017th%20International%20Conference%20on%20Parallel%20and%20Distributed%20Systems&rft.au=Keceli,%20F.&rft.date=2011-12&rft.spage=348&rft.epage=355&rft.pages=348-355&rft.issn=1521-9097&rft.eissn=2690-5965&rft.isbn=1457718758&rft.isbn_list=9781457718755&rft_id=info:doi/10.1109/ICPADS.2011.101&rft_dat=%3Cieee_6IE%3E6121297%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9780769545769&rft.eisbn_list=0769545769&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6121297&rfr_iscdi=true |