DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime
GPUs are readily available in cloud computing and personal devices, but their use for data processing acceleration has been slowed down by their limited integration with common programming languages such as Python or Java. Moreover, using GPUs to their full capabilities requires expert knowledge of...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Parravicini, Alberto Delamare, Arnaud Arnaboldi, Marco Santambrogio, Marco D |
description | GPUs are readily available in cloud computing and personal devices, but their
use for data processing acceleration has been slowed down by their limited
integration with common programming languages such as Python or Java. Moreover,
using GPUs to their full capabilities requires expert knowledge of asynchronous
programming. In this work, we present a novel GPU run time scheduler for
multi-task GPU computations that transparently provides asynchronous execution,
space-sharing, and transfer-computation overlap without requiring in advance
any information about the program dependency structure. We leverage the GrCUDA
polyglot API to integrate our scheduler with multiple high-level languages and
provide a platform for fast prototyping and easy GPU acceleration. We validate
our work on 6 benchmarks created to evaluate task-parallelism and show an
average of 44% speedup against synchronous execution, with no execution time
slowdown compared to hand-optimized host code written using the C++ CUDA Graphs
API. |
doi_str_mv | 10.48550/arxiv.2012.09646 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2012_09646</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2012_09646</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-1a52ff302364074f0d99b55dd4e85fbe3bc448cfbd5dfdda60353ac7fe13282f3</originalsourceid><addsrcrecordid>eNotz71OwzAYhWEvDKhwAUz4BhIc_yUZowIBqRVVW-bwxT-NhZtEjgP07lEL05He4UgPQncZSXkhBHmA8OO-UkoympJScnmNPh6rOmlhMhrvVGf07F1_wN8udnhrpmEOyuBdB-Fc7RDwevbRJRGmT1yNo3cKohv6CbseA94M_nTwQ8T15h1v5z66o7lBVxb8ZG7_d4H2z0_75Uuyeqtfl9UqAZnLJANBrWWEMslJzi3RZdkKoTU3hbCtYa3ivFC21UJbrUESJhio3JqM0YJatkD3f7cXYjMGd4Rwas7U5kJlv3D3T20</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime</title><source>arXiv.org</source><creator>Parravicini, Alberto ; Delamare, Arnaud ; Arnaboldi, Marco ; Santambrogio, Marco D</creator><creatorcontrib>Parravicini, Alberto ; Delamare, Arnaud ; Arnaboldi, Marco ; Santambrogio, Marco D</creatorcontrib><description>GPUs are readily available in cloud computing and personal devices, but their
use for data processing acceleration has been slowed down by their limited
integration with common programming languages such as Python or Java. Moreover,
using GPUs to their full capabilities requires expert knowledge of asynchronous
programming. In this work, we present a novel GPU run time scheduler for
multi-task GPU computations that transparently provides asynchronous execution,
space-sharing, and transfer-computation overlap without requiring in advance
any information about the program dependency structure. We leverage the GrCUDA
polyglot API to integrate our scheduler with multiple high-level languages and
provide a platform for fast prototyping and easy GPU acceleration. We validate
our work on 6 benchmarks created to evaluate task-parallelism and show an
average of 44% speedup against synchronous execution, with no execution time
slowdown compared to hand-optimized host code written using the C++ CUDA Graphs
API.</description><identifier>DOI: 10.48550/arxiv.2012.09646</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Hardware Architecture</subject><creationdate>2020-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2012.09646$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2012.09646$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Parravicini, Alberto</creatorcontrib><creatorcontrib>Delamare, Arnaud</creatorcontrib><creatorcontrib>Arnaboldi, Marco</creatorcontrib><creatorcontrib>Santambrogio, Marco D</creatorcontrib><title>DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime</title><description>GPUs are readily available in cloud computing and personal devices, but their
use for data processing acceleration has been slowed down by their limited
integration with common programming languages such as Python or Java. Moreover,
using GPUs to their full capabilities requires expert knowledge of asynchronous
programming. In this work, we present a novel GPU run time scheduler for
multi-task GPU computations that transparently provides asynchronous execution,
space-sharing, and transfer-computation overlap without requiring in advance
any information about the program dependency structure. We leverage the GrCUDA
polyglot API to integrate our scheduler with multiple high-level languages and
provide a platform for fast prototyping and easy GPU acceleration. We validate
our work on 6 benchmarks created to evaluate task-parallelism and show an
average of 44% speedup against synchronous execution, with no execution time
slowdown compared to hand-optimized host code written using the C++ CUDA Graphs
API.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Hardware Architecture</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAYhWEvDKhwAUz4BhIc_yUZowIBqRVVW-bwxT-NhZtEjgP07lEL05He4UgPQncZSXkhBHmA8OO-UkoympJScnmNPh6rOmlhMhrvVGf07F1_wN8udnhrpmEOyuBdB-Fc7RDwevbRJRGmT1yNo3cKohv6CbseA94M_nTwQ8T15h1v5z66o7lBVxb8ZG7_d4H2z0_75Uuyeqtfl9UqAZnLJANBrWWEMslJzi3RZdkKoTU3hbCtYa3ivFC21UJbrUESJhio3JqM0YJatkD3f7cXYjMGd4Rwas7U5kJlv3D3T20</recordid><startdate>20201217</startdate><enddate>20201217</enddate><creator>Parravicini, Alberto</creator><creator>Delamare, Arnaud</creator><creator>Arnaboldi, Marco</creator><creator>Santambrogio, Marco D</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201217</creationdate><title>DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime</title><author>Parravicini, Alberto ; Delamare, Arnaud ; Arnaboldi, Marco ; Santambrogio, Marco D</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-1a52ff302364074f0d99b55dd4e85fbe3bc448cfbd5dfdda60353ac7fe13282f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Hardware Architecture</topic><toplevel>online_resources</toplevel><creatorcontrib>Parravicini, Alberto</creatorcontrib><creatorcontrib>Delamare, Arnaud</creatorcontrib><creatorcontrib>Arnaboldi, Marco</creatorcontrib><creatorcontrib>Santambrogio, Marco D</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Parravicini, Alberto</au><au>Delamare, Arnaud</au><au>Arnaboldi, Marco</au><au>Santambrogio, Marco D</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime</atitle><date>2020-12-17</date><risdate>2020</risdate><abstract>GPUs are readily available in cloud computing and personal devices, but their
use for data processing acceleration has been slowed down by their limited
integration with common programming languages such as Python or Java. Moreover,
using GPUs to their full capabilities requires expert knowledge of asynchronous
programming. In this work, we present a novel GPU run time scheduler for
multi-task GPU computations that transparently provides asynchronous execution,
space-sharing, and transfer-computation overlap without requiring in advance
any information about the program dependency structure. We leverage the GrCUDA
polyglot API to integrate our scheduler with multiple high-level languages and
provide a platform for fast prototyping and easy GPU acceleration. We validate
our work on 6 benchmarks created to evaluate task-parallelism and show an
average of 44% speedup against synchronous execution, with no execution time
slowdown compared to hand-optimized host code written using the C++ CUDA Graphs
API.</abstract><doi>10.48550/arxiv.2012.09646</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2012.09646 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2012_09646 |
source | arXiv.org |
subjects | Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Hardware Architecture |
title | DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T20%3A28%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DAG-based%20Scheduling%20with%20Resource%20Sharing%20for%20Multi-task%20Applications%20in%20a%20Polyglot%20GPU%20Runtime&rft.au=Parravicini,%20Alberto&rft.date=2020-12-17&rft_id=info:doi/10.48550/arxiv.2012.09646&rft_dat=%3Carxiv_GOX%3E2012_09646%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |