Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading

Heterogeneous multicores like GPGPUs are now commonplace in modern computing systems. Although heterogeneous multicores offer the potential for high performance, programmers are struggling to program such systems. This paper presents OAO, a compiler-based approach to automatically translate shared-m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2021-05, Vol.77 (5), p.4957-4987
Hauptverfasser:	Wang, Farui, Zhang, Weizhe, Guo, Haonan, Hao, Meng, Lu, Gangzhao, Wang, Zheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Communication Compilers Computer Science Interpreters Optimization Parallel programming Processor Architectures Programmers Programming Languages
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	4987
container_issue	5
container_start_page	4957
container_title	The Journal of supercomputing
container_volume	77
creator	Wang, Farui Zhang, Weizhe Guo, Haonan Hao, Meng Lu, Gangzhao Wang, Zheng
description	Heterogeneous multicores like GPGPUs are now commonplace in modern computing systems. Although heterogeneous multicores offer the potential for high performance, programmers are struggling to program such systems. This paper presents OAO, a compiler-based approach to automatically translate shared-memory OpenMP data-parallel programs to run on heterogeneous multicores through OpenMP offloading directives. Given the large user base of shared memory OpenMP programs, our approach allows programmers to continue using a single-source-based programming language that they are familiar with while benefiting from the heterogeneous performance. OAO introduces a novel runtime optimization scheme to automatically eliminate unnecessary host–device communication to minimize the communication overhead between the host and the accelerator device. We evaluate OAO by applying it to 23 benchmarks from the PolyBench and Rodinia suites on two distinct GPU platforms. Experimental results show that OAO achieves up to 32 × speedup over the original OpenMP version, and can reduce the host–device communication overhead by up to 99% over the hand-translated version.
doi_str_mv	10.1007/s11227-020-03452-2
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2505720708</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2505720708</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-bd3352f4beccce5c2a8f78da301df63c3a16b798612b4cd7aad6e4061009da0e3</originalsourceid><addsrcrecordid>eNp9kLtOxDAQRS0EEsvCD1BZog6MH4mTcrXiJYGWAmrLcexsVkkcbKfg7zEEQUc1M5p774wOQpcErgmAuAmEUCoyoJAB4znN6BFakVywDHjJj9EKqrQqc05P0VkIBwDgTLAVajdzdIOKncbRqzH0qXUjdhY3Kio8Ka_63vR48q71agjYOo_3Jpo0m9G4OfxqujDguPdubvd4N5nx-SXF2N6pphvbc3RiVR_MxU9do7e729ftQ_a0u3_cbp4yzUgVs7phLKeW10ZrbXJNVWlF2SgGpLEF00yRohZVWRBac90IpZrCcCgShKpRYNgaXS256eH32YQoD272YzopaQ65oCCgTCq6qLR3IXhj5eS7QfkPSUB-AZULUJmAym-gkiYTW0whicfW-L_of1yfWG57KA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2505720708</pqid></control><display><type>article</type><title>Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading</title><source>Springer Nature - Complete Springer Journals</source><creator>Wang, Farui ; Zhang, Weizhe ; Guo, Haonan ; Hao, Meng ; Lu, Gangzhao ; Wang, Zheng</creator><creatorcontrib>Wang, Farui ; Zhang, Weizhe ; Guo, Haonan ; Hao, Meng ; Lu, Gangzhao ; Wang, Zheng</creatorcontrib><description>Heterogeneous multicores like GPGPUs are now commonplace in modern computing systems. Although heterogeneous multicores offer the potential for high performance, programmers are struggling to program such systems. This paper presents OAO, a compiler-based approach to automatically translate shared-memory OpenMP data-parallel programs to run on heterogeneous multicores through OpenMP offloading directives. Given the large user base of shared memory OpenMP programs, our approach allows programmers to continue using a single-source-based programming language that they are familiar with while benefiting from the heterogeneous performance. OAO introduces a novel runtime optimization scheme to automatically eliminate unnecessary host–device communication to minimize the communication overhead between the host and the accelerator device. We evaluate OAO by applying it to 23 benchmarks from the PolyBench and Rodinia suites on two distinct GPU platforms. Experimental results show that OAO achieves up to 32 × speedup over the original OpenMP version, and can reduce the host–device communication overhead by up to 99% over the hand-translated version.</description><identifier>ISSN: 0920-8542</identifier><identifier>EISSN: 1573-0484</identifier><identifier>DOI: 10.1007/s11227-020-03452-2</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Communication ; Compilers ; Computer Science ; Interpreters ; Optimization ; Parallel programming ; Processor Architectures ; Programmers ; Programming Languages</subject><ispartof>The Journal of supercomputing, 2021-05, Vol.77 (5), p.4957-4987</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-bd3352f4beccce5c2a8f78da301df63c3a16b798612b4cd7aad6e4061009da0e3</citedby><cites>FETCH-LOGICAL-c319t-bd3352f4beccce5c2a8f78da301df63c3a16b798612b4cd7aad6e4061009da0e3</cites><orcidid>0000-0003-4783-876X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11227-020-03452-2$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11227-020-03452-2$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Wang, Farui</creatorcontrib><creatorcontrib>Zhang, Weizhe</creatorcontrib><creatorcontrib>Guo, Haonan</creatorcontrib><creatorcontrib>Hao, Meng</creatorcontrib><creatorcontrib>Lu, Gangzhao</creatorcontrib><creatorcontrib>Wang, Zheng</creatorcontrib><title>Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading</title><title>The Journal of supercomputing</title><addtitle>J Supercomput</addtitle><description>Heterogeneous multicores like GPGPUs are now commonplace in modern computing systems. Although heterogeneous multicores offer the potential for high performance, programmers are struggling to program such systems. This paper presents OAO, a compiler-based approach to automatically translate shared-memory OpenMP data-parallel programs to run on heterogeneous multicores through OpenMP offloading directives. Given the large user base of shared memory OpenMP programs, our approach allows programmers to continue using a single-source-based programming language that they are familiar with while benefiting from the heterogeneous performance. OAO introduces a novel runtime optimization scheme to automatically eliminate unnecessary host–device communication to minimize the communication overhead between the host and the accelerator device. We evaluate OAO by applying it to 23 benchmarks from the PolyBench and Rodinia suites on two distinct GPU platforms. Experimental results show that OAO achieves up to 32 × speedup over the original OpenMP version, and can reduce the host–device communication overhead by up to 99% over the hand-translated version.</description><subject>Communication</subject><subject>Compilers</subject><subject>Computer Science</subject><subject>Interpreters</subject><subject>Optimization</subject><subject>Parallel programming</subject><subject>Processor Architectures</subject><subject>Programmers</subject><subject>Programming Languages</subject><issn>0920-8542</issn><issn>1573-0484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kLtOxDAQRS0EEsvCD1BZog6MH4mTcrXiJYGWAmrLcexsVkkcbKfg7zEEQUc1M5p774wOQpcErgmAuAmEUCoyoJAB4znN6BFakVywDHjJj9EKqrQqc05P0VkIBwDgTLAVajdzdIOKncbRqzH0qXUjdhY3Kio8Ka_63vR48q71agjYOo_3Jpo0m9G4OfxqujDguPdubvd4N5nx-SXF2N6pphvbc3RiVR_MxU9do7e729ftQ_a0u3_cbp4yzUgVs7phLKeW10ZrbXJNVWlF2SgGpLEF00yRohZVWRBac90IpZrCcCgShKpRYNgaXS256eH32YQoD272YzopaQ65oCCgTCq6qLR3IXhj5eS7QfkPSUB-AZULUJmAym-gkiYTW0whicfW-L_of1yfWG57KA</recordid><startdate>20210501</startdate><enddate>20210501</enddate><creator>Wang, Farui</creator><creator>Zhang, Weizhe</creator><creator>Guo, Haonan</creator><creator>Hao, Meng</creator><creator>Lu, Gangzhao</creator><creator>Wang, Zheng</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-4783-876X</orcidid></search><sort><creationdate>20210501</creationdate><title>Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading</title><author>Wang, Farui ; Zhang, Weizhe ; Guo, Haonan ; Hao, Meng ; Lu, Gangzhao ; Wang, Zheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-bd3352f4beccce5c2a8f78da301df63c3a16b798612b4cd7aad6e4061009da0e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Communication</topic><topic>Compilers</topic><topic>Computer Science</topic><topic>Interpreters</topic><topic>Optimization</topic><topic>Parallel programming</topic><topic>Processor Architectures</topic><topic>Programmers</topic><topic>Programming Languages</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Farui</creatorcontrib><creatorcontrib>Zhang, Weizhe</creatorcontrib><creatorcontrib>Guo, Haonan</creatorcontrib><creatorcontrib>Hao, Meng</creatorcontrib><creatorcontrib>Lu, Gangzhao</creatorcontrib><creatorcontrib>Wang, Zheng</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of supercomputing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Farui</au><au>Zhang, Weizhe</au><au>Guo, Haonan</au><au>Hao, Meng</au><au>Lu, Gangzhao</au><au>Wang, Zheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading</atitle><jtitle>The Journal of supercomputing</jtitle><stitle>J Supercomput</stitle><date>2021-05-01</date><risdate>2021</risdate><volume>77</volume><issue>5</issue><spage>4957</spage><epage>4987</epage><pages>4957-4987</pages><issn>0920-8542</issn><eissn>1573-0484</eissn><abstract>Heterogeneous multicores like GPGPUs are now commonplace in modern computing systems. Although heterogeneous multicores offer the potential for high performance, programmers are struggling to program such systems. This paper presents OAO, a compiler-based approach to automatically translate shared-memory OpenMP data-parallel programs to run on heterogeneous multicores through OpenMP offloading directives. Given the large user base of shared memory OpenMP programs, our approach allows programmers to continue using a single-source-based programming language that they are familiar with while benefiting from the heterogeneous performance. OAO introduces a novel runtime optimization scheme to automatically eliminate unnecessary host–device communication to minimize the communication overhead between the host and the accelerator device. We evaluate OAO by applying it to 23 benchmarks from the PolyBench and Rodinia suites on two distinct GPU platforms. Experimental results show that OAO achieves up to 32 × speedup over the original OpenMP version, and can reduce the host–device communication overhead by up to 99% over the hand-translated version.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11227-020-03452-2</doi><tpages>31</tpages><orcidid>https://orcid.org/0000-0003-4783-876X</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0920-8542
ispartof	The Journal of supercomputing, 2021-05, Vol.77 (5), p.4957-4987
issn	0920-8542 1573-0484
language	eng
recordid	cdi_proquest_journals_2505720708
source	Springer Nature - Complete Springer Journals
subjects	Communication Compilers Computer Science Interpreters Optimization Parallel programming Processor Architectures Programmers Programming Languages
title	Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T13%3A52%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20translation%20of%20data%20parallel%20programs%20for%20heterogeneous%20parallelism%20through%20OpenMP%20offloading&rft.jtitle=The%20Journal%20of%20supercomputing&rft.au=Wang,%20Farui&rft.date=2021-05-01&rft.volume=77&rft.issue=5&rft.spage=4957&rft.epage=4987&rft.pages=4957-4987&rft.issn=0920-8542&rft.eissn=1573-0484&rft_id=info:doi/10.1007/s11227-020-03452-2&rft_dat=%3Cproquest_cross%3E2505720708%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2505720708&rft_id=info:pmid/&rfr_iscdi=true