Ginkgo—A math library designed for platform portability

In an era of increasing computer system diversity, the portability of software from one system to another plays a central role. Software portability is important for the software developers as many software projects have a lifetime longer than a specific system, e.g., a supercomputer, and it is impo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Parallel computing 2022-07, Vol.111 (C), p.102902, Article 102902
Hauptverfasser:	Cojean, Terry, Tsai, Yu-Hsiang Mike, Anzt, Hartwig
Format:	Artikel
Sprache:	eng
Schlagworte:	AMD Intel NVIDIA Performance portability Platform Portability Porting to GPU accelerators
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	C
container_start_page	102902
container_title	Parallel computing
container_volume	111
creator	Cojean, Terry Tsai, Yu-Hsiang Mike Anzt, Hartwig
description	In an era of increasing computer system diversity, the portability of software from one system to another plays a central role. Software portability is important for the software developers as many software projects have a lifetime longer than a specific system, e.g., a supercomputer, and it is important for the domain scientists that realize their scientific application in a software framework and want to be able to run on one or another system. On a high level, there exist two approaches for realizing platform portability: (1) implementing software using a portability layer leveraging any technique which always generates specific kernels from another language or through an interface for running on different architectures; and (2) providing backends for different hardware architectures, with the backends typically differing in how and in which programming language functionality is realized due to using the language of choice for each hardware (e.g., CUDA kernels for NVIDIA GPUs, SYCL (DPC++) kernels to targeting Intel GPUs and other supported hardware, …). In practice, these two approaches can be combined in applications to leverage their respective strengths. In this paper, we present how we realize portability across different hardware architectures for the Ginkgo library by following the second strategy and the goal to not only port to new hardware architectures but also achieve good performance. We present the Ginkgo library design, separating algorithms from hardware-specific kernels forming the distinct hardware executors, and report our experience when adding execution backends for NVIDIA, AMD, and Intel GPUs. We also present the performance we achieve with this approach for distinct hardware backends. •We discuss the Ginkgo design separating the numerical core from the architecture-specific backends written in the architecture-specific language to allow for performance portability.•We discuss how we ported Ginkgo to AMD GPUs by creating a HIP backend.•We discuss how we ported Ginkgo to Intel GPUs by creating a DPC++ backend.•We present performance results for basic sparse linear algebra kernels and complete Krylov iterative solver running on AMD, NVIDIA, and Intel GPUs.
doi_str_mv	10.1016/j.parco.2022.102902
format	Article
fullrecord	<record><control><sourceid>elsevier_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1872165</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167819122000096</els_id><sourcerecordid>S0167819122000096</sourcerecordid><originalsourceid>FETCH-LOGICAL-c375t-921102eee059b111fb719817d0859351ca1520fb8a8ba055d8bc4a33b601f60e3</originalsourceid><addsrcrecordid>eNp9kE1OwzAUhC0EEuXnBGwi9il-dp3YCxZVBQWpEhtYW7bjtC5pHNkWUnccghNyEhzCmtVITzOjeR9CN4DngKG6288HFYyfE0xIvhCByQmaAa9JWVNanaJZdtUlBwHn6CLGPca4WnA8Q2Lt-vet__78WhYHlXZF53RQ4Vg0Nrptb5ui9aEYOpWyHorBh6S061w6XqGzVnXRXv_pJXp7fHhdPZWbl_XzarkpDa1ZKgWBPMhai5nQANDqGgSHusGcCcrAKGAEt5orrhVmrOHaLBSlusLQVtjSS3Q79fqYnIzGJWt2xve9NUmOL0LFsolOJhN8jMG2cgjukP-QgOWISO7lLyI5IpITopy6n1I27_9wNoz1tje2cWFsb7z7N_8DlLpv5A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Ginkgo—A math library designed for platform portability</title><source>Elsevier ScienceDirect Journals</source><creator>Cojean, Terry ; Tsai, Yu-Hsiang Mike ; Anzt, Hartwig</creator><creatorcontrib>Cojean, Terry ; Tsai, Yu-Hsiang Mike ; Anzt, Hartwig</creatorcontrib><description>In an era of increasing computer system diversity, the portability of software from one system to another plays a central role. Software portability is important for the software developers as many software projects have a lifetime longer than a specific system, e.g., a supercomputer, and it is important for the domain scientists that realize their scientific application in a software framework and want to be able to run on one or another system. On a high level, there exist two approaches for realizing platform portability: (1) implementing software using a portability layer leveraging any technique which always generates specific kernels from another language or through an interface for running on different architectures; and (2) providing backends for different hardware architectures, with the backends typically differing in how and in which programming language functionality is realized due to using the language of choice for each hardware (e.g., CUDA kernels for NVIDIA GPUs, SYCL (DPC++) kernels to targeting Intel GPUs and other supported hardware, …). In practice, these two approaches can be combined in applications to leverage their respective strengths. In this paper, we present how we realize portability across different hardware architectures for the Ginkgo library by following the second strategy and the goal to not only port to new hardware architectures but also achieve good performance. We present the Ginkgo library design, separating algorithms from hardware-specific kernels forming the distinct hardware executors, and report our experience when adding execution backends for NVIDIA, AMD, and Intel GPUs. We also present the performance we achieve with this approach for distinct hardware backends. •We discuss the Ginkgo design separating the numerical core from the architecture-specific backends written in the architecture-specific language to allow for performance portability.•We discuss how we ported Ginkgo to AMD GPUs by creating a HIP backend.•We discuss how we ported Ginkgo to Intel GPUs by creating a DPC++ backend.•We present performance results for basic sparse linear algebra kernels and complete Krylov iterative solver running on AMD, NVIDIA, and Intel GPUs.</description><identifier>ISSN: 0167-8191</identifier><identifier>EISSN: 1872-7336</identifier><identifier>DOI: 10.1016/j.parco.2022.102902</identifier><language>eng</language><publisher>Netherlands: Elsevier B.V</publisher><subject>AMD ; Intel ; NVIDIA ; Performance portability ; Platform Portability ; Porting to GPU accelerators</subject><ispartof>Parallel computing, 2022-07, Vol.111 (C), p.102902, Article 102902</ispartof><rights>2022 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c375t-921102eee059b111fb719817d0859351ca1520fb8a8ba055d8bc4a33b601f60e3</citedby><cites>FETCH-LOGICAL-c375t-921102eee059b111fb719817d0859351ca1520fb8a8ba055d8bc4a33b601f60e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0167819122000096$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>230,314,776,780,881,3536,27903,27904,65309</link.rule.ids><backlink>$$Uhttps://www.osti.gov/biblio/1872165$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Cojean, Terry</creatorcontrib><creatorcontrib>Tsai, Yu-Hsiang Mike</creatorcontrib><creatorcontrib>Anzt, Hartwig</creatorcontrib><title>Ginkgo—A math library designed for platform portability</title><title>Parallel computing</title><description>In an era of increasing computer system diversity, the portability of software from one system to another plays a central role. Software portability is important for the software developers as many software projects have a lifetime longer than a specific system, e.g., a supercomputer, and it is important for the domain scientists that realize their scientific application in a software framework and want to be able to run on one or another system. On a high level, there exist two approaches for realizing platform portability: (1) implementing software using a portability layer leveraging any technique which always generates specific kernels from another language or through an interface for running on different architectures; and (2) providing backends for different hardware architectures, with the backends typically differing in how and in which programming language functionality is realized due to using the language of choice for each hardware (e.g., CUDA kernels for NVIDIA GPUs, SYCL (DPC++) kernels to targeting Intel GPUs and other supported hardware, …). In practice, these two approaches can be combined in applications to leverage their respective strengths. In this paper, we present how we realize portability across different hardware architectures for the Ginkgo library by following the second strategy and the goal to not only port to new hardware architectures but also achieve good performance. We present the Ginkgo library design, separating algorithms from hardware-specific kernels forming the distinct hardware executors, and report our experience when adding execution backends for NVIDIA, AMD, and Intel GPUs. We also present the performance we achieve with this approach for distinct hardware backends. •We discuss the Ginkgo design separating the numerical core from the architecture-specific backends written in the architecture-specific language to allow for performance portability.•We discuss how we ported Ginkgo to AMD GPUs by creating a HIP backend.•We discuss how we ported Ginkgo to Intel GPUs by creating a DPC++ backend.•We present performance results for basic sparse linear algebra kernels and complete Krylov iterative solver running on AMD, NVIDIA, and Intel GPUs.</description><subject>AMD</subject><subject>Intel</subject><subject>NVIDIA</subject><subject>Performance portability</subject><subject>Platform Portability</subject><subject>Porting to GPU accelerators</subject><issn>0167-8191</issn><issn>1872-7336</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kE1OwzAUhC0EEuXnBGwi9il-dp3YCxZVBQWpEhtYW7bjtC5pHNkWUnccghNyEhzCmtVITzOjeR9CN4DngKG6288HFYyfE0xIvhCByQmaAa9JWVNanaJZdtUlBwHn6CLGPca4WnA8Q2Lt-vet__78WhYHlXZF53RQ4Vg0Nrptb5ui9aEYOpWyHorBh6S061w6XqGzVnXRXv_pJXp7fHhdPZWbl_XzarkpDa1ZKgWBPMhai5nQANDqGgSHusGcCcrAKGAEt5orrhVmrOHaLBSlusLQVtjSS3Q79fqYnIzGJWt2xve9NUmOL0LFsolOJhN8jMG2cgjukP-QgOWISO7lLyI5IpITopy6n1I27_9wNoz1tje2cWFsb7z7N_8DlLpv5A</recordid><startdate>202207</startdate><enddate>202207</enddate><creator>Cojean, Terry</creator><creator>Tsai, Yu-Hsiang Mike</creator><creator>Anzt, Hartwig</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>AAYXX</scope><scope>CITATION</scope><scope>OTOTI</scope></search><sort><creationdate>202207</creationdate><title>Ginkgo—A math library designed for platform portability</title><author>Cojean, Terry ; Tsai, Yu-Hsiang Mike ; Anzt, Hartwig</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c375t-921102eee059b111fb719817d0859351ca1520fb8a8ba055d8bc4a33b601f60e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>AMD</topic><topic>Intel</topic><topic>NVIDIA</topic><topic>Performance portability</topic><topic>Platform Portability</topic><topic>Porting to GPU accelerators</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cojean, Terry</creatorcontrib><creatorcontrib>Tsai, Yu-Hsiang Mike</creatorcontrib><creatorcontrib>Anzt, Hartwig</creatorcontrib><collection>CrossRef</collection><collection>OSTI.GOV</collection><jtitle>Parallel computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cojean, Terry</au><au>Tsai, Yu-Hsiang Mike</au><au>Anzt, Hartwig</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Ginkgo—A math library designed for platform portability</atitle><jtitle>Parallel computing</jtitle><date>2022-07</date><risdate>2022</risdate><volume>111</volume><issue>C</issue><spage>102902</spage><pages>102902-</pages><artnum>102902</artnum><issn>0167-8191</issn><eissn>1872-7336</eissn><abstract>In an era of increasing computer system diversity, the portability of software from one system to another plays a central role. Software portability is important for the software developers as many software projects have a lifetime longer than a specific system, e.g., a supercomputer, and it is important for the domain scientists that realize their scientific application in a software framework and want to be able to run on one or another system. On a high level, there exist two approaches for realizing platform portability: (1) implementing software using a portability layer leveraging any technique which always generates specific kernels from another language or through an interface for running on different architectures; and (2) providing backends for different hardware architectures, with the backends typically differing in how and in which programming language functionality is realized due to using the language of choice for each hardware (e.g., CUDA kernels for NVIDIA GPUs, SYCL (DPC++) kernels to targeting Intel GPUs and other supported hardware, …). In practice, these two approaches can be combined in applications to leverage their respective strengths. In this paper, we present how we realize portability across different hardware architectures for the Ginkgo library by following the second strategy and the goal to not only port to new hardware architectures but also achieve good performance. We present the Ginkgo library design, separating algorithms from hardware-specific kernels forming the distinct hardware executors, and report our experience when adding execution backends for NVIDIA, AMD, and Intel GPUs. We also present the performance we achieve with this approach for distinct hardware backends. •We discuss the Ginkgo design separating the numerical core from the architecture-specific backends written in the architecture-specific language to allow for performance portability.•We discuss how we ported Ginkgo to AMD GPUs by creating a HIP backend.•We discuss how we ported Ginkgo to Intel GPUs by creating a DPC++ backend.•We present performance results for basic sparse linear algebra kernels and complete Krylov iterative solver running on AMD, NVIDIA, and Intel GPUs.</abstract><cop>Netherlands</cop><pub>Elsevier B.V</pub><doi>10.1016/j.parco.2022.102902</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0167-8191
ispartof	Parallel computing, 2022-07, Vol.111 (C), p.102902, Article 102902
issn	0167-8191 1872-7336
language	eng
recordid	cdi_osti_scitechconnect_1872165
source	Elsevier ScienceDirect Journals
subjects	AMD Intel NVIDIA Performance portability Platform Portability Porting to GPU accelerators
title	Ginkgo—A math library designed for platform portability
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T14%3A32%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Ginkgo%E2%80%94A%20math%20library%20designed%20for%20platform%20portability&rft.jtitle=Parallel%20computing&rft.au=Cojean,%20Terry&rft.date=2022-07&rft.volume=111&rft.issue=C&rft.spage=102902&rft.pages=102902-&rft.artnum=102902&rft.issn=0167-8191&rft.eissn=1872-7336&rft_id=info:doi/10.1016/j.parco.2022.102902&rft_dat=%3Celsevier_osti_%3ES0167819122000096%3C/elsevier_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_els_id=S0167819122000096&rfr_iscdi=true