Clustered Pipelined Multithreading on Commodity Multi-Core Processors

Recently proposed pipelined multithreading (PMT) techniques have shown great applicability to parallelizing general programs on multi-core processors. However, the potential performance of these techniques is limited by the large inter-core communication overheads which become a performance bottlene...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Shisutemu Seigyo Jouhou Gakkai rombunshi Control and Information Engineers, 2009, Vol.22(11), pp.371-384
Hauptverfasser:	Zhang, Yuanming, Ootsu, Kanemitsu, Yokota, Takashi, Baba, Takanobu
Format:	Artikel
Sprache:	eng
Schlagworte:	average communication overheads clustered communication mechanism commodity multi-core processors pipelined multithreading stage decomposition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	384
container_issue	11
container_start_page	371
container_title	Shisutemu Seigyo Jouhou Gakkai rombunshi
container_volume	22
creator	Zhang, Yuanming Ootsu, Kanemitsu Yokota, Takashi Baba, Takanobu
description	Recently proposed pipelined multithreading (PMT) techniques have shown great applicability to parallelizing general programs on multi-core processors. However, the potential performance of these techniques is limited by the large inter-core communication overheads which become a performance bottleneck. This paper addresses this problem and presents a novel clustered pipelined multithreading (CPMT) technique that can construct efficient pipeline parallelism on commodity multi-core processors. This technique combines a clustered communication mechanism that can greatly reduce average communication overheads (ACOs) in software only approach. We quantitatively demonstrate the performance of CPMT can be improved through reducing the ACOs and show the performance characteristics. Moreover, we also give the stage decomposition procedure and provide a stage execution framework that can execute the multiple stages within one procedure. The effectiveness of CPMT technique has been evaluated on the commodity AMD Phenom four-core processors. Experimental results show that our CPMT technique achieves speedup ranging from 116.8% to 219.8% on some typical loops extracted from SPEC CPU 2000 benchmark programs.
doi_str_mv	10.5687/iscie.22.371
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1439116889</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3088790171</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1891-575c134744b985970b3aa9b7c7c32fc153ab4c44afe3f885c7975d2584ab8e753</originalsourceid><addsrcrecordid>eNpFkE1Lw0AQhhdRsNTe_AEBr6Zm9iO7OUqoH1CxBwVvy2Yzabek2bqbHvrvjabUy8zA-8y8w0vILWRzkSv54KJ1OKd0ziRckAkFJVIF8HVJJsA4TUWeq2syi9FVGQPJAZiYkEXZHmKPAetk5fbYum6Y3g5t7_pNQFO7bp34Lin9budr1x9HLS19wGQVvMUYfYg35KoxbcTZqU_J59Pio3xJl-_Pr-XjMrWgCkiFFHb4RXJeFUoUMquYMUUlrbSMNhYEMxW3nJsGWaOUsLKQoqZCcVMplIJNyd14dx_89wFjr7f-ELrBUgNnBUCuVDFQ9yNlg48xYKP3we1MOGrI9G9W-i8rTakeshrwcsS3sTdrPMMm9M62-A8DnOqwdVbtxgSNHfsBKTh1Aw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1439116889</pqid></control><display><type>article</type><title>Clustered Pipelined Multithreading on Commodity Multi-Core Processors</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>J-STAGE (Japan Science & Technology Information Aggregator, Electronic) Freely Available Titles - Japanese</source><creator>Zhang, Yuanming ; Ootsu, Kanemitsu ; Yokota, Takashi ; Baba, Takanobu</creator><creatorcontrib>Zhang, Yuanming ; Ootsu, Kanemitsu ; Yokota, Takashi ; Baba, Takanobu</creatorcontrib><description>Recently proposed pipelined multithreading (PMT) techniques have shown great applicability to parallelizing general programs on multi-core processors. However, the potential performance of these techniques is limited by the large inter-core communication overheads which become a performance bottleneck. This paper addresses this problem and presents a novel clustered pipelined multithreading (CPMT) technique that can construct efficient pipeline parallelism on commodity multi-core processors. This technique combines a clustered communication mechanism that can greatly reduce average communication overheads (ACOs) in software only approach. We quantitatively demonstrate the performance of CPMT can be improved through reducing the ACOs and show the performance characteristics. Moreover, we also give the stage decomposition procedure and provide a stage execution framework that can execute the multiple stages within one procedure. The effectiveness of CPMT technique has been evaluated on the commodity AMD Phenom four-core processors. Experimental results show that our CPMT technique achieves speedup ranging from 116.8% to 219.8% on some typical loops extracted from SPEC CPU 2000 benchmark programs.</description><identifier>ISSN: 1342-5668</identifier><identifier>EISSN: 2185-811X</identifier><identifier>DOI: 10.5687/iscie.22.371</identifier><language>eng</language><publisher>Kyoto: THE INSTITUTE OF SYSTEMS, CONTROL AND INFORMATION ENGINEERS (ISCIE)</publisher><subject>average communication overheads ; clustered communication mechanism ; commodity multi-core processors ; pipelined multithreading ; stage decomposition</subject><ispartof>Transactions of the Institute of Systems, Control and Information Engineers, 2009, Vol.22(11), pp.371-384</ispartof><rights>2009 The Institute of Systems, Control and Information Engineers</rights><rights>Copyright Japan Science and Technology Agency 2009</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1891-575c134744b985970b3aa9b7c7c32fc153ab4c44afe3f885c7975d2584ab8e753</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1876,27903,27904</link.rule.ids></links><search><creatorcontrib>Zhang, Yuanming</creatorcontrib><creatorcontrib>Ootsu, Kanemitsu</creatorcontrib><creatorcontrib>Yokota, Takashi</creatorcontrib><creatorcontrib>Baba, Takanobu</creatorcontrib><title>Clustered Pipelined Multithreading on Commodity Multi-Core Processors</title><title>Shisutemu Seigyo Jouhou Gakkai rombunshi</title><addtitle>Transactions of the Institute of Systems, Control and Information Engineers</addtitle><description>Recently proposed pipelined multithreading (PMT) techniques have shown great applicability to parallelizing general programs on multi-core processors. However, the potential performance of these techniques is limited by the large inter-core communication overheads which become a performance bottleneck. This paper addresses this problem and presents a novel clustered pipelined multithreading (CPMT) technique that can construct efficient pipeline parallelism on commodity multi-core processors. This technique combines a clustered communication mechanism that can greatly reduce average communication overheads (ACOs) in software only approach. We quantitatively demonstrate the performance of CPMT can be improved through reducing the ACOs and show the performance characteristics. Moreover, we also give the stage decomposition procedure and provide a stage execution framework that can execute the multiple stages within one procedure. The effectiveness of CPMT technique has been evaluated on the commodity AMD Phenom four-core processors. Experimental results show that our CPMT technique achieves speedup ranging from 116.8% to 219.8% on some typical loops extracted from SPEC CPU 2000 benchmark programs.</description><subject>average communication overheads</subject><subject>clustered communication mechanism</subject><subject>commodity multi-core processors</subject><subject>pipelined multithreading</subject><subject>stage decomposition</subject><issn>1342-5668</issn><issn>2185-811X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><recordid>eNpFkE1Lw0AQhhdRsNTe_AEBr6Zm9iO7OUqoH1CxBwVvy2Yzabek2bqbHvrvjabUy8zA-8y8w0vILWRzkSv54KJ1OKd0ziRckAkFJVIF8HVJJsA4TUWeq2syi9FVGQPJAZiYkEXZHmKPAetk5fbYum6Y3g5t7_pNQFO7bp34Lin9budr1x9HLS19wGQVvMUYfYg35KoxbcTZqU_J59Pio3xJl-_Pr-XjMrWgCkiFFHb4RXJeFUoUMquYMUUlrbSMNhYEMxW3nJsGWaOUsLKQoqZCcVMplIJNyd14dx_89wFjr7f-ELrBUgNnBUCuVDFQ9yNlg48xYKP3we1MOGrI9G9W-i8rTakeshrwcsS3sTdrPMMm9M62-A8DnOqwdVbtxgSNHfsBKTh1Aw</recordid><startdate>20091101</startdate><enddate>20091101</enddate><creator>Zhang, Yuanming</creator><creator>Ootsu, Kanemitsu</creator><creator>Yokota, Takashi</creator><creator>Baba, Takanobu</creator><general>THE INSTITUTE OF SYSTEMS, CONTROL AND INFORMATION ENGINEERS (ISCIE)</general><general>Japan Science and Technology Agency</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope></search><sort><creationdate>20091101</creationdate><title>Clustered Pipelined Multithreading on Commodity Multi-Core Processors</title><author>Zhang, Yuanming ; Ootsu, Kanemitsu ; Yokota, Takashi ; Baba, Takanobu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1891-575c134744b985970b3aa9b7c7c32fc153ab4c44afe3f885c7975d2584ab8e753</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>average communication overheads</topic><topic>clustered communication mechanism</topic><topic>commodity multi-core processors</topic><topic>pipelined multithreading</topic><topic>stage decomposition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yuanming</creatorcontrib><creatorcontrib>Ootsu, Kanemitsu</creatorcontrib><creatorcontrib>Yokota, Takashi</creatorcontrib><creatorcontrib>Baba, Takanobu</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>Shisutemu Seigyo Jouhou Gakkai rombunshi</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Yuanming</au><au>Ootsu, Kanemitsu</au><au>Yokota, Takashi</au><au>Baba, Takanobu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustered Pipelined Multithreading on Commodity Multi-Core Processors</atitle><jtitle>Shisutemu Seigyo Jouhou Gakkai rombunshi</jtitle><addtitle>Transactions of the Institute of Systems, Control and Information Engineers</addtitle><date>2009-11-01</date><risdate>2009</risdate><volume>22</volume><issue>11</issue><spage>371</spage><epage>384</epage><pages>371-384</pages><issn>1342-5668</issn><eissn>2185-811X</eissn><abstract>Recently proposed pipelined multithreading (PMT) techniques have shown great applicability to parallelizing general programs on multi-core processors. However, the potential performance of these techniques is limited by the large inter-core communication overheads which become a performance bottleneck. This paper addresses this problem and presents a novel clustered pipelined multithreading (CPMT) technique that can construct efficient pipeline parallelism on commodity multi-core processors. This technique combines a clustered communication mechanism that can greatly reduce average communication overheads (ACOs) in software only approach. We quantitatively demonstrate the performance of CPMT can be improved through reducing the ACOs and show the performance characteristics. Moreover, we also give the stage decomposition procedure and provide a stage execution framework that can execute the multiple stages within one procedure. The effectiveness of CPMT technique has been evaluated on the commodity AMD Phenom four-core processors. Experimental results show that our CPMT technique achieves speedup ranging from 116.8% to 219.8% on some typical loops extracted from SPEC CPU 2000 benchmark programs.</abstract><cop>Kyoto</cop><pub>THE INSTITUTE OF SYSTEMS, CONTROL AND INFORMATION ENGINEERS (ISCIE)</pub><doi>10.5687/iscie.22.371</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1342-5668
ispartof	Transactions of the Institute of Systems, Control and Information Engineers, 2009, Vol.22(11), pp.371-384
issn	1342-5668 2185-811X
language	eng
recordid	cdi_proquest_journals_1439116889
source	Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; J-STAGE (Japan Science & Technology Information Aggregator, Electronic) Freely Available Titles - Japanese
subjects	average communication overheads clustered communication mechanism commodity multi-core processors pipelined multithreading stage decomposition
title	Clustered Pipelined Multithreading on Commodity Multi-Core Processors
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T14%3A46%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustered%20Pipelined%20Multithreading%20on%20Commodity%20Multi-Core%20Processors&rft.jtitle=Shisutemu%20Seigyo%20Jouhou%20Gakkai%20rombunshi&rft.au=Zhang,%20Yuanming&rft.date=2009-11-01&rft.volume=22&rft.issue=11&rft.spage=371&rft.epage=384&rft.pages=371-384&rft.issn=1342-5668&rft.eissn=2185-811X&rft_id=info:doi/10.5687/iscie.22.371&rft_dat=%3Cproquest_cross%3E3088790171%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1439116889&rft_id=info:pmid/&rfr_iscdi=true