Affinity-aware work-stealing for integrated CPU-GPU processors

Recent integrated CPU-GPU processors like Intel's Broadwell and AMD's Kaveri support hardware CPU-GPU shared virtual memory, atomic operations, and memory coherency. This enables fine-grained CPU-GPU work-stealing, but architectural differences between the CPU and GPU hurt the performance...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SIGPLAN notices 2016-11, Vol.51 (8), p.1-2
Hauptverfasser:	Farooqui, Naila, Barik, Rajkishore, Lewis, Brian T., Shpeisman, Tatiana, Schwan, Karsten
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2
container_issue	8
container_start_page	1
container_title	SIGPLAN notices
container_volume	51
creator	Farooqui, Naila Barik, Rajkishore Lewis, Brian T. Shpeisman, Tatiana Schwan, Karsten
description	Recent integrated CPU-GPU processors like Intel's Broadwell and AMD's Kaveri support hardware CPU-GPU shared virtual memory, atomic operations, and memory coherency. This enables fine-grained CPU-GPU work-stealing, but architectural differences between the CPU and GPU hurt the performance of traditionally-implemented work-stealing on such processors. These architectural differences include different clock frequencies, atomic operation costs, and cache and shared memory latencies. This paper describes a preliminary implementation of our work-stealing scheduler, Libra, which includes techniques to deal with these architectural differences in integrated CPU-GPU processors. Libra's affinity-aware techniques achieve significant performance gains over classically-implemented work-stealing. We show preliminary results using a diverse set of nine regular and irregular workloads running on an Intel Broadwell Core-M processor. Libra currently achieves up to a 2× performance improvement over classical work-stealing, with a 20% average improvement.
doi_str_mv	10.1145/3016078.2851194
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3016078_2851194</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3016078_2851194</sourcerecordid><originalsourceid>FETCH-crossref_primary_10_1145_3016078_28511943</originalsourceid><addsrcrecordid>eNqVzssKwjAQheEgCtbL2m1eIDrTtLVuBBEvSxe6DkETidZGZgri26vQF3B14IcDnxAThClils80YAHzcpqWOeIi64gE87xU-K1dkYAuUoU6g74YMN8AQENaJmK58j7UoXkr-7Lk5CvSXXHjbBXqq_SRZKgbdyXbuItcH05qdzjJJ8WzY47EI9HztmI3bncoZtvNcb1XZ4rM5Lx5UnhYehsE82Oalmlapv7_8QE3OEM_</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Affinity-aware work-stealing for integrated CPU-GPU processors</title><source>ACM Digital Library Complete</source><creator>Farooqui, Naila ; Barik, Rajkishore ; Lewis, Brian T. ; Shpeisman, Tatiana ; Schwan, Karsten</creator><creatorcontrib>Farooqui, Naila ; Barik, Rajkishore ; Lewis, Brian T. ; Shpeisman, Tatiana ; Schwan, Karsten</creatorcontrib><description>Recent integrated CPU-GPU processors like Intel's Broadwell and AMD's Kaveri support hardware CPU-GPU shared virtual memory, atomic operations, and memory coherency. This enables fine-grained CPU-GPU work-stealing, but architectural differences between the CPU and GPU hurt the performance of traditionally-implemented work-stealing on such processors. These architectural differences include different clock frequencies, atomic operation costs, and cache and shared memory latencies. This paper describes a preliminary implementation of our work-stealing scheduler, Libra, which includes techniques to deal with these architectural differences in integrated CPU-GPU processors. Libra's affinity-aware techniques achieve significant performance gains over classically-implemented work-stealing. We show preliminary results using a diverse set of nine regular and irregular workloads running on an Intel Broadwell Core-M processor. Libra currently achieves up to a 2× performance improvement over classical work-stealing, with a 20% average improvement.</description><identifier>ISSN: 0362-1340</identifier><identifier>EISSN: 1558-1160</identifier><identifier>DOI: 10.1145/3016078.2851194</identifier><language>eng</language><ispartof>SIGPLAN notices, 2016-11, Vol.51 (8), p.1-2</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-crossref_primary_10_1145_3016078_28511943</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Farooqui, Naila</creatorcontrib><creatorcontrib>Barik, Rajkishore</creatorcontrib><creatorcontrib>Lewis, Brian T.</creatorcontrib><creatorcontrib>Shpeisman, Tatiana</creatorcontrib><creatorcontrib>Schwan, Karsten</creatorcontrib><title>Affinity-aware work-stealing for integrated CPU-GPU processors</title><title>SIGPLAN notices</title><description>Recent integrated CPU-GPU processors like Intel's Broadwell and AMD's Kaveri support hardware CPU-GPU shared virtual memory, atomic operations, and memory coherency. This enables fine-grained CPU-GPU work-stealing, but architectural differences between the CPU and GPU hurt the performance of traditionally-implemented work-stealing on such processors. These architectural differences include different clock frequencies, atomic operation costs, and cache and shared memory latencies. This paper describes a preliminary implementation of our work-stealing scheduler, Libra, which includes techniques to deal with these architectural differences in integrated CPU-GPU processors. Libra's affinity-aware techniques achieve significant performance gains over classically-implemented work-stealing. We show preliminary results using a diverse set of nine regular and irregular workloads running on an Intel Broadwell Core-M processor. Libra currently achieves up to a 2× performance improvement over classical work-stealing, with a 20% average improvement.</description><issn>0362-1340</issn><issn>1558-1160</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNqVzssKwjAQheEgCtbL2m1eIDrTtLVuBBEvSxe6DkETidZGZgri26vQF3B14IcDnxAThClils80YAHzcpqWOeIi64gE87xU-K1dkYAuUoU6g74YMN8AQENaJmK58j7UoXkr-7Lk5CvSXXHjbBXqq_SRZKgbdyXbuItcH05qdzjJJ8WzY47EI9HztmI3bncoZtvNcb1XZ4rM5Lx5UnhYehsE82Oalmlapv7_8QE3OEM_</recordid><startdate>20161109</startdate><enddate>20161109</enddate><creator>Farooqui, Naila</creator><creator>Barik, Rajkishore</creator><creator>Lewis, Brian T.</creator><creator>Shpeisman, Tatiana</creator><creator>Schwan, Karsten</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20161109</creationdate><title>Affinity-aware work-stealing for integrated CPU-GPU processors</title><author>Farooqui, Naila ; Barik, Rajkishore ; Lewis, Brian T. ; Shpeisman, Tatiana ; Schwan, Karsten</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-crossref_primary_10_1145_3016078_28511943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Farooqui, Naila</creatorcontrib><creatorcontrib>Barik, Rajkishore</creatorcontrib><creatorcontrib>Lewis, Brian T.</creatorcontrib><creatorcontrib>Shpeisman, Tatiana</creatorcontrib><creatorcontrib>Schwan, Karsten</creatorcontrib><collection>CrossRef</collection><jtitle>SIGPLAN notices</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Farooqui, Naila</au><au>Barik, Rajkishore</au><au>Lewis, Brian T.</au><au>Shpeisman, Tatiana</au><au>Schwan, Karsten</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Affinity-aware work-stealing for integrated CPU-GPU processors</atitle><jtitle>SIGPLAN notices</jtitle><date>2016-11-09</date><risdate>2016</risdate><volume>51</volume><issue>8</issue><spage>1</spage><epage>2</epage><pages>1-2</pages><issn>0362-1340</issn><eissn>1558-1160</eissn><abstract>Recent integrated CPU-GPU processors like Intel's Broadwell and AMD's Kaveri support hardware CPU-GPU shared virtual memory, atomic operations, and memory coherency. This enables fine-grained CPU-GPU work-stealing, but architectural differences between the CPU and GPU hurt the performance of traditionally-implemented work-stealing on such processors. These architectural differences include different clock frequencies, atomic operation costs, and cache and shared memory latencies. This paper describes a preliminary implementation of our work-stealing scheduler, Libra, which includes techniques to deal with these architectural differences in integrated CPU-GPU processors. Libra's affinity-aware techniques achieve significant performance gains over classically-implemented work-stealing. We show preliminary results using a diverse set of nine regular and irregular workloads running on an Intel Broadwell Core-M processor. Libra currently achieves up to a 2× performance improvement over classical work-stealing, with a 20% average improvement.</abstract><doi>10.1145/3016078.2851194</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 0362-1340
ispartof	SIGPLAN notices, 2016-11, Vol.51 (8), p.1-2
issn	0362-1340 1558-1160
language	eng
recordid	cdi_crossref_primary_10_1145_3016078_2851194
source	ACM Digital Library Complete
title	Affinity-aware work-stealing for integrated CPU-GPU processors
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T08%3A29%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Affinity-aware%20work-stealing%20for%20integrated%20CPU-GPU%20processors&rft.jtitle=SIGPLAN%20notices&rft.au=Farooqui,%20Naila&rft.date=2016-11-09&rft.volume=51&rft.issue=8&rft.spage=1&rft.epage=2&rft.pages=1-2&rft.issn=0362-1340&rft.eissn=1558-1160&rft_id=info:doi/10.1145/3016078.2851194&rft_dat=%3Ccrossref%3E10_1145_3016078_2851194%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true