A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle

Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing time that multiple cores spend on doing useful work is higher...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2019-01
Hauptverfasser:	Deters, Justin, Wu, Jiaye, Xu, Yifan, I-Ting, Angelina Lee
Format:	Artikel
Sprache:	eng
Schlagworte:	Automatic control Computation Computer memory Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Performance First principles Parallel programming Task scheduling
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Deters, Justin Wu, Jiaye Xu, Yifan I-Ting, Angelina Lee
description	Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing time that multiple cores spend on doing useful work is higher compared to the time required to do the same amount of work on one core, due to effects experienced only during parallel executions such as additional cache misses, remote memory accesses, and memory bandwidth issues. It's possible to mitigate work inflation by co-locating the computation with the data, but this is nontrivial to do with task parallel programs. First, by design, the scheduling for task parallel programs is automated, giving the user little control over where the computation is performed. Second, the platforms tend to employ work stealing, which provides strong theoretical guarantees, but its randomized protocol for load balancing does not discern between work items that are far away versus ones that are closer. In this work, we propose NUMA-WS, a NUMA-aware task parallel platform engineering based on the work-first principle. By abiding by the work-first principle, we are able to obtain a platform that is work efficient, provides the same theoretical guarantees as the classic work stealing scheduler, and mitigates work inflation. Furthermore, we implemented a prototype platform by modifying Intel's Cilk Plus runtime system and empirically demonstrate that the resulting system is work efficient and scalable.
doi_str_mv	10.48550/arxiv.1806.11128
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_1806_11128</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2073590550</sourcerecordid><originalsourceid>FETCH-LOGICAL-a520-912682f3b7aad916ea59331036da87fb057d0ede453c6a7f8941d6de1ac8fea83</originalsourceid><addsrcrecordid>eNotkM1OAjEYRRsTEwnyAK5s4rrYn2mnsxwJqAkqC4zuJh_TNhbKDLYjyts7gqu7ubk55yJ0xeg401LSW4g_fj9mmqoxY4zrMzTgQjCiM84v0CilNaWUq5xLKQbovcTPr08lKb8hWryI7R5W4UCmzvna26bDS0gbsoAIIdiAFwE618YtvoNkDW4b3H1Y_NbGDZn5mLp-wTe13wV7ic4dhGRH_zlEy9l0OXkg85f7x0k5JyA5JQXjSnMnVjmAKZiyIIselgplQOduRWVuqDU2k6JWkDtdZMwoYxnU2lnQYoiuT7NH7WoX_RbiofrTr476fePm1NjF9vPLpq5at1-x6ZkqTnMhC9q_Jn4Bf0lcfg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2073590550</pqid></control><display><type>article</type><title>A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Deters, Justin ; Wu, Jiaye ; Xu, Yifan ; I-Ting, Angelina Lee</creator><creatorcontrib>Deters, Justin ; Wu, Jiaye ; Xu, Yifan ; I-Ting, Angelina Lee</creatorcontrib><description>Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing time that multiple cores spend on doing useful work is higher compared to the time required to do the same amount of work on one core, due to effects experienced only during parallel executions such as additional cache misses, remote memory accesses, and memory bandwidth issues. It's possible to mitigate work inflation by co-locating the computation with the data, but this is nontrivial to do with task parallel programs. First, by design, the scheduling for task parallel programs is automated, giving the user little control over where the computation is performed. Second, the platforms tend to employ work stealing, which provides strong theoretical guarantees, but its randomized protocol for load balancing does not discern between work items that are far away versus ones that are closer. In this work, we propose NUMA-WS, a NUMA-aware task parallel platform engineering based on the work-first principle. By abiding by the work-first principle, we are able to obtain a platform that is work efficient, provides the same theoretical guarantees as the classic work stealing scheduler, and mitigates work inflation. Furthermore, we implemented a prototype platform by modifying Intel's Cilk Plus runtime system and empirically demonstrate that the resulting system is work efficient and scalable.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.1806.11128</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Automatic control ; Computation ; Computer memory ; Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Performance ; First principles ; Parallel programming ; Task scheduling</subject><ispartof>arXiv.org, 2019-01</ispartof><rights>2019. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27925</link.rule.ids><backlink>$$Uhttps://doi.org/10.1109/IISWC.2018.8573486$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.1806.11128$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Deters, Justin</creatorcontrib><creatorcontrib>Wu, Jiaye</creatorcontrib><creatorcontrib>Xu, Yifan</creatorcontrib><creatorcontrib>I-Ting, Angelina Lee</creatorcontrib><title>A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle</title><title>arXiv.org</title><description>Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing time that multiple cores spend on doing useful work is higher compared to the time required to do the same amount of work on one core, due to effects experienced only during parallel executions such as additional cache misses, remote memory accesses, and memory bandwidth issues. It's possible to mitigate work inflation by co-locating the computation with the data, but this is nontrivial to do with task parallel programs. First, by design, the scheduling for task parallel programs is automated, giving the user little control over where the computation is performed. Second, the platforms tend to employ work stealing, which provides strong theoretical guarantees, but its randomized protocol for load balancing does not discern between work items that are far away versus ones that are closer. In this work, we propose NUMA-WS, a NUMA-aware task parallel platform engineering based on the work-first principle. By abiding by the work-first principle, we are able to obtain a platform that is work efficient, provides the same theoretical guarantees as the classic work stealing scheduler, and mitigates work inflation. Furthermore, we implemented a prototype platform by modifying Intel's Cilk Plus runtime system and empirically demonstrate that the resulting system is work efficient and scalable.</description><subject>Automatic control</subject><subject>Computation</subject><subject>Computer memory</subject><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Performance</subject><subject>First principles</subject><subject>Parallel programming</subject><subject>Task scheduling</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkM1OAjEYRRsTEwnyAK5s4rrYn2mnsxwJqAkqC4zuJh_TNhbKDLYjyts7gqu7ubk55yJ0xeg401LSW4g_fj9mmqoxY4zrMzTgQjCiM84v0CilNaWUq5xLKQbovcTPr08lKb8hWryI7R5W4UCmzvna26bDS0gbsoAIIdiAFwE618YtvoNkDW4b3H1Y_NbGDZn5mLp-wTe13wV7ic4dhGRH_zlEy9l0OXkg85f7x0k5JyA5JQXjSnMnVjmAKZiyIIselgplQOduRWVuqDU2k6JWkDtdZMwoYxnU2lnQYoiuT7NH7WoX_RbiofrTr476fePm1NjF9vPLpq5at1-x6ZkqTnMhC9q_Jn4Bf0lcfg</recordid><startdate>20190107</startdate><enddate>20190107</enddate><creator>Deters, Justin</creator><creator>Wu, Jiaye</creator><creator>Xu, Yifan</creator><creator>I-Ting, Angelina Lee</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20190107</creationdate><title>A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle</title><author>Deters, Justin ; Wu, Jiaye ; Xu, Yifan ; I-Ting, Angelina Lee</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a520-912682f3b7aad916ea59331036da87fb057d0ede453c6a7f8941d6de1ac8fea83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Automatic control</topic><topic>Computation</topic><topic>Computer memory</topic><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Performance</topic><topic>First principles</topic><topic>Parallel programming</topic><topic>Task scheduling</topic><toplevel>online_resources</toplevel><creatorcontrib>Deters, Justin</creatorcontrib><creatorcontrib>Wu, Jiaye</creatorcontrib><creatorcontrib>Xu, Yifan</creatorcontrib><creatorcontrib>I-Ting, Angelina Lee</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Deters, Justin</au><au>Wu, Jiaye</au><au>Xu, Yifan</au><au>I-Ting, Angelina Lee</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle</atitle><jtitle>arXiv.org</jtitle><date>2019-01-07</date><risdate>2019</risdate><eissn>2331-8422</eissn><abstract>Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing time that multiple cores spend on doing useful work is higher compared to the time required to do the same amount of work on one core, due to effects experienced only during parallel executions such as additional cache misses, remote memory accesses, and memory bandwidth issues. It's possible to mitigate work inflation by co-locating the computation with the data, but this is nontrivial to do with task parallel programs. First, by design, the scheduling for task parallel programs is automated, giving the user little control over where the computation is performed. Second, the platforms tend to employ work stealing, which provides strong theoretical guarantees, but its randomized protocol for load balancing does not discern between work items that are far away versus ones that are closer. In this work, we propose NUMA-WS, a NUMA-aware task parallel platform engineering based on the work-first principle. By abiding by the work-first principle, we are able to obtain a platform that is work efficient, provides the same theoretical guarantees as the classic work stealing scheduler, and mitigates work inflation. Furthermore, we implemented a prototype platform by modifying Intel's Cilk Plus runtime system and empirically demonstrate that the resulting system is work efficient and scalable.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.1806.11128</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2019-01
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_1806_11128
source	arXiv.org; Free E- Journals
subjects	Automatic control Computation Computer memory Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Performance First principles Parallel programming Task scheduling
title	A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T09%3A44%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20NUMA-Aware%20Provably-Efficient%20Task-Parallel%20Platform%20Based%20on%20the%20Work-First%20Principle&rft.jtitle=arXiv.org&rft.au=Deters,%20Justin&rft.date=2019-01-07&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.1806.11128&rft_dat=%3Cproquest_arxiv%3E2073590550%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2073590550&rft_id=info:pmid/&rfr_iscdi=true