Power-Performance Comparison of Single-Task Driven Many-Cores

Many-cores, processors with 100s of cores, are becoming increasingly popular in general-purpose computing, yet power is a limiting factor in their performance. In this paper, we compare the power and performance of two design points in the many-core processor domain. The XMT general-purpose processo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Keceli, F., Moreshet, T., Vishkin, U.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 355
container_issue
container_start_page 348
container_title
container_volume
creator Keceli, F.
Moreshet, T.
Vishkin, U.
description Many-cores, processors with 100s of cores, are becoming increasingly popular in general-purpose computing, yet power is a limiting factor in their performance. In this paper, we compare the power and performance of two design points in the many-core processor domain. The XMT general-purpose processor provides significant runtime advantage on irregular parallel programs (e.g., graph algorithms). This was previously demonstrated and tied to its architecture choices and ease-of-programming. In contrast, current commercial GPUs excel at regular parallel programs that require high processing capability. In this work, we set the power envelope as a constraint and evaluate an envisioned 1024-core XMT processor against an NVIDIA GTX280 GPU considering various scenarios for estimating the power of the XMT chip. Even under worst-case assumptions and scenarios, simulations show that the XMT processor sustains its advantage over the GPU on irregular parallel programs, while not falling significantly behind on regular programs. The total energy spent per benchmark fits a similar pattern. Given that the two architectures target different types of parallelism, a future system can potentially utilize an XMT chip and a GPU chip in complementary roles.
doi_str_mv 10.1109/ICPADS.2011.101
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6121297</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6121297</ieee_id><sourcerecordid>6121297</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-2629874763c3f9ec0e99ae683de062a1e2bd35b8de514d7248b4afe5c628714c3</originalsourceid><addsrcrecordid>eNotjMtKw0AUQMcXmNauXbjJD0y8dybzWrgoqY9CxUDrukySG4k2mTIRS__egm7OgbM4jN0iZIjg7pdFOV-sMwGIGQKesZkzFox2KlcnnrNEaAdcOa0u2ARP0aA1yl6yBJVA7sCZazYZx08AAVJBwh7KcKDIS4ptiL0fakqL0O997MYwpKFN193wsSO-8eNXuojdDw3pqx-OvAiRxht21frdSLN_T9n70-OmeOGrt-dlMV_xDo365kILZ01utKxl66gGcs6TtrIh0MIjiaqRqrINKcwbI3Jb5b4lVWthDea1nLK7v29HRNt97Hofj1uNAoUz8hezgEuN</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Power-Performance Comparison of Single-Task Driven Many-Cores</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Keceli, F. ; Moreshet, T. ; Vishkin, U.</creator><creatorcontrib>Keceli, F. ; Moreshet, T. ; Vishkin, U.</creatorcontrib><description>Many-cores, processors with 100s of cores, are becoming increasingly popular in general-purpose computing, yet power is a limiting factor in their performance. In this paper, we compare the power and performance of two design points in the many-core processor domain. The XMT general-purpose processor provides significant runtime advantage on irregular parallel programs (e.g., graph algorithms). This was previously demonstrated and tied to its architecture choices and ease-of-programming. In contrast, current commercial GPUs excel at regular parallel programs that require high processing capability. In this work, we set the power envelope as a constraint and evaluate an envisioned 1024-core XMT processor against an NVIDIA GTX280 GPU considering various scenarios for estimating the power of the XMT chip. Even under worst-case assumptions and scenarios, simulations show that the XMT processor sustains its advantage over the GPU on irregular parallel programs, while not falling significantly behind on regular programs. The total energy spent per benchmark fits a similar pattern. Given that the two architectures target different types of parallelism, a future system can potentially utilize an XMT chip and a GPU chip in complementary roles.</description><identifier>ISSN: 1521-9097</identifier><identifier>ISBN: 1457718758</identifier><identifier>ISBN: 9781457718755</identifier><identifier>EISSN: 2690-5965</identifier><identifier>EISBN: 9780769545769</identifier><identifier>EISBN: 0769545769</identifier><identifier>DOI: 10.1109/ICPADS.2011.101</identifier><language>eng</language><publisher>IEEE</publisher><subject>Benchmark testing ; Clocks ; Computer architecture ; GPU ; Graphics processing unit ; many-core ; parallelism ; power and performance comparison ; PRAM ; Random access memory ; Temperature measurement ; XMT</subject><ispartof>2011 IEEE 17th International Conference on Parallel and Distributed Systems, 2011, p.348-355</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6121297$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6121297$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Keceli, F.</creatorcontrib><creatorcontrib>Moreshet, T.</creatorcontrib><creatorcontrib>Vishkin, U.</creatorcontrib><title>Power-Performance Comparison of Single-Task Driven Many-Cores</title><title>2011 IEEE 17th International Conference on Parallel and Distributed Systems</title><addtitle>icpads</addtitle><description>Many-cores, processors with 100s of cores, are becoming increasingly popular in general-purpose computing, yet power is a limiting factor in their performance. In this paper, we compare the power and performance of two design points in the many-core processor domain. The XMT general-purpose processor provides significant runtime advantage on irregular parallel programs (e.g., graph algorithms). This was previously demonstrated and tied to its architecture choices and ease-of-programming. In contrast, current commercial GPUs excel at regular parallel programs that require high processing capability. In this work, we set the power envelope as a constraint and evaluate an envisioned 1024-core XMT processor against an NVIDIA GTX280 GPU considering various scenarios for estimating the power of the XMT chip. Even under worst-case assumptions and scenarios, simulations show that the XMT processor sustains its advantage over the GPU on irregular parallel programs, while not falling significantly behind on regular programs. The total energy spent per benchmark fits a similar pattern. Given that the two architectures target different types of parallelism, a future system can potentially utilize an XMT chip and a GPU chip in complementary roles.</description><subject>Benchmark testing</subject><subject>Clocks</subject><subject>Computer architecture</subject><subject>GPU</subject><subject>Graphics processing unit</subject><subject>many-core</subject><subject>parallelism</subject><subject>power and performance comparison</subject><subject>PRAM</subject><subject>Random access memory</subject><subject>Temperature measurement</subject><subject>XMT</subject><issn>1521-9097</issn><issn>2690-5965</issn><isbn>1457718758</isbn><isbn>9781457718755</isbn><isbn>9780769545769</isbn><isbn>0769545769</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjMtKw0AUQMcXmNauXbjJD0y8dybzWrgoqY9CxUDrukySG4k2mTIRS__egm7OgbM4jN0iZIjg7pdFOV-sMwGIGQKesZkzFox2KlcnnrNEaAdcOa0u2ARP0aA1yl6yBJVA7sCZazYZx08AAVJBwh7KcKDIS4ptiL0fakqL0O997MYwpKFN193wsSO-8eNXuojdDw3pqx-OvAiRxht21frdSLN_T9n70-OmeOGrt-dlMV_xDo365kILZ01utKxl66gGcs6TtrIh0MIjiaqRqrINKcwbI3Jb5b4lVWthDea1nLK7v29HRNt97Hofj1uNAoUz8hezgEuN</recordid><startdate>201112</startdate><enddate>201112</enddate><creator>Keceli, F.</creator><creator>Moreshet, T.</creator><creator>Vishkin, U.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201112</creationdate><title>Power-Performance Comparison of Single-Task Driven Many-Cores</title><author>Keceli, F. ; Moreshet, T. ; Vishkin, U.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-2629874763c3f9ec0e99ae683de062a1e2bd35b8de514d7248b4afe5c628714c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Benchmark testing</topic><topic>Clocks</topic><topic>Computer architecture</topic><topic>GPU</topic><topic>Graphics processing unit</topic><topic>many-core</topic><topic>parallelism</topic><topic>power and performance comparison</topic><topic>PRAM</topic><topic>Random access memory</topic><topic>Temperature measurement</topic><topic>XMT</topic><toplevel>online_resources</toplevel><creatorcontrib>Keceli, F.</creatorcontrib><creatorcontrib>Moreshet, T.</creatorcontrib><creatorcontrib>Vishkin, U.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Keceli, F.</au><au>Moreshet, T.</au><au>Vishkin, U.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Power-Performance Comparison of Single-Task Driven Many-Cores</atitle><btitle>2011 IEEE 17th International Conference on Parallel and Distributed Systems</btitle><stitle>icpads</stitle><date>2011-12</date><risdate>2011</risdate><spage>348</spage><epage>355</epage><pages>348-355</pages><issn>1521-9097</issn><eissn>2690-5965</eissn><isbn>1457718758</isbn><isbn>9781457718755</isbn><eisbn>9780769545769</eisbn><eisbn>0769545769</eisbn><abstract>Many-cores, processors with 100s of cores, are becoming increasingly popular in general-purpose computing, yet power is a limiting factor in their performance. In this paper, we compare the power and performance of two design points in the many-core processor domain. The XMT general-purpose processor provides significant runtime advantage on irregular parallel programs (e.g., graph algorithms). This was previously demonstrated and tied to its architecture choices and ease-of-programming. In contrast, current commercial GPUs excel at regular parallel programs that require high processing capability. In this work, we set the power envelope as a constraint and evaluate an envisioned 1024-core XMT processor against an NVIDIA GTX280 GPU considering various scenarios for estimating the power of the XMT chip. Even under worst-case assumptions and scenarios, simulations show that the XMT processor sustains its advantage over the GPU on irregular parallel programs, while not falling significantly behind on regular programs. The total energy spent per benchmark fits a similar pattern. Given that the two architectures target different types of parallelism, a future system can potentially utilize an XMT chip and a GPU chip in complementary roles.</abstract><pub>IEEE</pub><doi>10.1109/ICPADS.2011.101</doi><tpages>8</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1521-9097
ispartof 2011 IEEE 17th International Conference on Parallel and Distributed Systems, 2011, p.348-355
issn 1521-9097
2690-5965
language eng
recordid cdi_ieee_primary_6121297
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Benchmark testing
Clocks
Computer architecture
GPU
Graphics processing unit
many-core
parallelism
power and performance comparison
PRAM
Random access memory
Temperature measurement
XMT
title Power-Performance Comparison of Single-Task Driven Many-Cores
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T08%3A10%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Power-Performance%20Comparison%20of%20Single-Task%20Driven%20Many-Cores&rft.btitle=2011%20IEEE%2017th%20International%20Conference%20on%20Parallel%20and%20Distributed%20Systems&rft.au=Keceli,%20F.&rft.date=2011-12&rft.spage=348&rft.epage=355&rft.pages=348-355&rft.issn=1521-9097&rft.eissn=2690-5965&rft.isbn=1457718758&rft.isbn_list=9781457718755&rft_id=info:doi/10.1109/ICPADS.2011.101&rft_dat=%3Cieee_6IE%3E6121297%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9780769545769&rft.eisbn_list=0769545769&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6121297&rfr_iscdi=true