Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems

The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allow...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of parallel and distributed computing 2019-03, Vol.125, p.45-57
Hauptverfasser:	Pérez, B., Stafford, E., Bosque, J.L., Beivide, R., Mateo, S., Teruel, X., Martorell, X., Ayguadé, E.
Format:	Artikel
Sprache:	eng
Schlagworte:	Arquitectura de computadors Arquitectures paral·leles Co-execution Heterogeneous systems Informàtica OmpSs programming model OpenCL Parallel programming (Computer science) Programació en paral·lel (Informàtica) Àrees temàtiques de la UPC
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	57
container_issue
container_start_page	45
container_title	Journal of parallel and distributed computing
container_volume	125
creator	Pérez, B. Stafford, E. Bosque, J.L. Beivide, R. Mateo, S. Teruel, X. Martorell, X. Ayguadé, E.
description	The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it does not support the co-execution of a single kernel on several devices. This paper presents an extension of OmpSs that rises to this challenge, and presents Auto-Tune, a load balancing algorithm that automatically adjusts its internal parameters to suit the hardware capabilities and application behavior. The extension allows programmers to take full advantage of the computing devices with negligible impact on the code. It takes care of two main issues. First, the automatic distribution of datasets and the management of device memory address spaces. Second, the implementation of a set of load balancing algorithms to adapt to the particularities of applications and systems. Experimental results reveal that the co-execution of single kernels on all the devices in the node is beneficial in terms of performance and energy consumption, and that Auto-Tune gives the best overall results.
doi_str_mv	10.1016/j.jpdc.2018.11.001
format	Article
fullrecord	<record><control><sourceid>csuc_cross</sourceid><recordid>TN_cdi_csuc_recercat_oai_recercat_cat_2072_338082</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0743731518308189</els_id><sourcerecordid>oai_recercat_cat_2072_338082</sourcerecordid><originalsourceid>FETCH-LOGICAL-c386t-3f1cd435c9761a4eef949957e39ce15cba29b68b3312495f2cb6d40d08e99f3e3</originalsourceid><addsrcrecordid>eNp9kM9KxDAQh4MouK6-gKe8QGsm6Z8EvCyLusLCHtRzaNOppu42JUnFfXtbXPDmYRgGft8M8xFyCywFBsVdl3ZDY1LOQKYAKWNwRhbAVJEwmclzsmBlJpJSQH5JrkLopgDkpVyQzWqMLoljjw3dDdivt_QTfY97alyC32jGaF1PbU93h-El0NZ5-oERvXvHHt0YaDiGiIdwTS7aah_w5tSX5O3x4XW9Sba7p-f1apsYIYuYiBZMk4ncqLKAKkNsVaZUXqJQBiE3dcVVXchaCOCZyltu6qLJWMMkKtUKFEsCv3tNGI32aNCbKmpX2b9hLs5KroWQTPKJ4SfGuxA8tnrw9lD5owamZ3-607M_PfvTAHrSM0H3vxBO33xZ9DoYi73Bxk6Hom6c_Q__AVhXeVk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems</title><source>Recercat</source><source>Elsevier ScienceDirect Journals</source><creator>Pérez, B. ; Stafford, E. ; Bosque, J.L. ; Beivide, R. ; Mateo, S. ; Teruel, X. ; Martorell, X. ; Ayguadé, E.</creator><creatorcontrib>Pérez, B. ; Stafford, E. ; Bosque, J.L. ; Beivide, R. ; Mateo, S. ; Teruel, X. ; Martorell, X. ; Ayguadé, E.</creatorcontrib><description>The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it does not support the co-execution of a single kernel on several devices. This paper presents an extension of OmpSs that rises to this challenge, and presents Auto-Tune, a load balancing algorithm that automatically adjusts its internal parameters to suit the hardware capabilities and application behavior. The extension allows programmers to take full advantage of the computing devices with negligible impact on the code. It takes care of two main issues. First, the automatic distribution of datasets and the management of device memory address spaces. Second, the implementation of a set of load balancing algorithms to adapt to the particularities of applications and systems. Experimental results reveal that the co-execution of single kernels on all the devices in the node is beneficial in terms of performance and energy consumption, and that Auto-Tune gives the best overall results.</description><identifier>ISSN: 0743-7315</identifier><identifier>EISSN: 1096-0848</identifier><identifier>DOI: 10.1016/j.jpdc.2018.11.001</identifier><language>eng</language><publisher>Elsevier Inc</publisher><subject>Arquitectura de computadors ; Arquitectures paral·leles ; Co-execution ; Heterogeneous systems ; Informàtica ; OmpSs programming model ; OpenCL ; Parallel programming (Computer science) ; Programació en paral·lel (Informàtica) ; Àrees temàtiques de la UPC</subject><ispartof>Journal of parallel and distributed computing, 2019-03, Vol.125, p.45-57</ispartof><rights>2018 Elsevier Inc.</rights><rights>Attribution-NonCommercial-NoDerivs 3.0 Spain info:eu-repo/semantics/openAccess <a href="http://creativecommons.org/licenses/by-nc-nd/3.0/es/">http://creativecommons.org/licenses/by-nc-nd/3.0/es/</a></rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c386t-3f1cd435c9761a4eef949957e39ce15cba29b68b3312495f2cb6d40d08e99f3e3</citedby><cites>FETCH-LOGICAL-c386t-3f1cd435c9761a4eef949957e39ce15cba29b68b3312495f2cb6d40d08e99f3e3</cites><orcidid>0000-0002-0417-3430 ; 0000-0002-7718-8449</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0743731518308189$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>230,314,776,780,881,3537,26951,27901,27902,65534</link.rule.ids></links><search><creatorcontrib>Pérez, B.</creatorcontrib><creatorcontrib>Stafford, E.</creatorcontrib><creatorcontrib>Bosque, J.L.</creatorcontrib><creatorcontrib>Beivide, R.</creatorcontrib><creatorcontrib>Mateo, S.</creatorcontrib><creatorcontrib>Teruel, X.</creatorcontrib><creatorcontrib>Martorell, X.</creatorcontrib><creatorcontrib>Ayguadé, E.</creatorcontrib><title>Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems</title><title>Journal of parallel and distributed computing</title><description>The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it does not support the co-execution of a single kernel on several devices. This paper presents an extension of OmpSs that rises to this challenge, and presents Auto-Tune, a load balancing algorithm that automatically adjusts its internal parameters to suit the hardware capabilities and application behavior. The extension allows programmers to take full advantage of the computing devices with negligible impact on the code. It takes care of two main issues. First, the automatic distribution of datasets and the management of device memory address spaces. Second, the implementation of a set of load balancing algorithms to adapt to the particularities of applications and systems. Experimental results reveal that the co-execution of single kernels on all the devices in the node is beneficial in terms of performance and energy consumption, and that Auto-Tune gives the best overall results.</description><subject>Arquitectura de computadors</subject><subject>Arquitectures paral·leles</subject><subject>Co-execution</subject><subject>Heterogeneous systems</subject><subject>Informàtica</subject><subject>OmpSs programming model</subject><subject>OpenCL</subject><subject>Parallel programming (Computer science)</subject><subject>Programació en paral·lel (Informàtica)</subject><subject>Àrees temàtiques de la UPC</subject><issn>0743-7315</issn><issn>1096-0848</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>XX2</sourceid><recordid>eNp9kM9KxDAQh4MouK6-gKe8QGsm6Z8EvCyLusLCHtRzaNOppu42JUnFfXtbXPDmYRgGft8M8xFyCywFBsVdl3ZDY1LOQKYAKWNwRhbAVJEwmclzsmBlJpJSQH5JrkLopgDkpVyQzWqMLoljjw3dDdivt_QTfY97alyC32jGaF1PbU93h-El0NZ5-oERvXvHHt0YaDiGiIdwTS7aah_w5tSX5O3x4XW9Sba7p-f1apsYIYuYiBZMk4ncqLKAKkNsVaZUXqJQBiE3dcVVXchaCOCZyltu6qLJWMMkKtUKFEsCv3tNGI32aNCbKmpX2b9hLs5KroWQTPKJ4SfGuxA8tnrw9lD5owamZ3-607M_PfvTAHrSM0H3vxBO33xZ9DoYi73Bxk6Hom6c_Q__AVhXeVk</recordid><startdate>201903</startdate><enddate>201903</enddate><creator>Pérez, B.</creator><creator>Stafford, E.</creator><creator>Bosque, J.L.</creator><creator>Beivide, R.</creator><creator>Mateo, S.</creator><creator>Teruel, X.</creator><creator>Martorell, X.</creator><creator>Ayguadé, E.</creator><general>Elsevier Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>XX2</scope><orcidid>https://orcid.org/0000-0002-0417-3430</orcidid><orcidid>https://orcid.org/0000-0002-7718-8449</orcidid></search><sort><creationdate>201903</creationdate><title>Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems</title><author>Pérez, B. ; Stafford, E. ; Bosque, J.L. ; Beivide, R. ; Mateo, S. ; Teruel, X. ; Martorell, X. ; Ayguadé, E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c386t-3f1cd435c9761a4eef949957e39ce15cba29b68b3312495f2cb6d40d08e99f3e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Arquitectura de computadors</topic><topic>Arquitectures paral·leles</topic><topic>Co-execution</topic><topic>Heterogeneous systems</topic><topic>Informàtica</topic><topic>OmpSs programming model</topic><topic>OpenCL</topic><topic>Parallel programming (Computer science)</topic><topic>Programació en paral·lel (Informàtica)</topic><topic>Àrees temàtiques de la UPC</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pérez, B.</creatorcontrib><creatorcontrib>Stafford, E.</creatorcontrib><creatorcontrib>Bosque, J.L.</creatorcontrib><creatorcontrib>Beivide, R.</creatorcontrib><creatorcontrib>Mateo, S.</creatorcontrib><creatorcontrib>Teruel, X.</creatorcontrib><creatorcontrib>Martorell, X.</creatorcontrib><creatorcontrib>Ayguadé, E.</creatorcontrib><collection>CrossRef</collection><collection>Recercat</collection><jtitle>Journal of parallel and distributed computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pérez, B.</au><au>Stafford, E.</au><au>Bosque, J.L.</au><au>Beivide, R.</au><au>Mateo, S.</au><au>Teruel, X.</au><au>Martorell, X.</au><au>Ayguadé, E.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems</atitle><jtitle>Journal of parallel and distributed computing</jtitle><date>2019-03</date><risdate>2019</risdate><volume>125</volume><spage>45</spage><epage>57</epage><pages>45-57</pages><issn>0743-7315</issn><eissn>1096-0848</eissn><abstract>The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it does not support the co-execution of a single kernel on several devices. This paper presents an extension of OmpSs that rises to this challenge, and presents Auto-Tune, a load balancing algorithm that automatically adjusts its internal parameters to suit the hardware capabilities and application behavior. The extension allows programmers to take full advantage of the computing devices with negligible impact on the code. It takes care of two main issues. First, the automatic distribution of datasets and the management of device memory address spaces. Second, the implementation of a set of load balancing algorithms to adapt to the particularities of applications and systems. Experimental results reveal that the co-execution of single kernels on all the devices in the node is beneficial in terms of performance and energy consumption, and that Auto-Tune gives the best overall results.</abstract><pub>Elsevier Inc</pub><doi>10.1016/j.jpdc.2018.11.001</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-0417-3430</orcidid><orcidid>https://orcid.org/0000-0002-7718-8449</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0743-7315
ispartof	Journal of parallel and distributed computing, 2019-03, Vol.125, p.45-57
issn	0743-7315 1096-0848
language	eng
recordid	cdi_csuc_recercat_oai_recercat_cat_2072_338082
source	Recercat; Elsevier ScienceDirect Journals
subjects	Arquitectura de computadors Arquitectures paral·leles Co-execution Heterogeneous systems Informàtica OmpSs programming model OpenCL Parallel programming (Computer science) Programació en paral·lel (Informàtica) Àrees temàtiques de la UPC
title	Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T22%3A49%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-csuc_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Auto-tuned%20OpenCL%20kernel%20co-execution%20in%20OmpSs%20for%20heterogeneous%20systems&rft.jtitle=Journal%20of%20parallel%20and%20distributed%20computing&rft.au=P%C3%A9rez,%20B.&rft.date=2019-03&rft.volume=125&rft.spage=45&rft.epage=57&rft.pages=45-57&rft.issn=0743-7315&rft.eissn=1096-0848&rft_id=info:doi/10.1016/j.jpdc.2018.11.001&rft_dat=%3Ccsuc_cross%3Eoai_recercat_cat_2072_338082%3C/csuc_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_els_id=S0743731518308189&rfr_iscdi=true