The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs

This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Communications in nonlinear science & numerical simulation 2022-09, Vol.112, p.106521, Article 106521
Hauptverfasser:	Nagy, Dániel, Plavecz, Lambert, Hegedűs, Ferenc
Format:	Artikel
Sprache:	eng
Schlagworte:	Asynchronous C++ (programming language) Central processing units Computer peripherals CPU programming CPUs Differential equations Event handling GPU programming Graphics processing units Hardware Lorenz equations Microprocessors Non-stiff problems Optimization Ordinary differential equations Relief valves Runge-Kutta method
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	106521
container_title	Communications in nonlinear science & numerical simulation
container_volume	112
creator	Nagy, Dániel Plavecz, Lambert Hegedűs, Ferenc
description	This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance. •Solving large number of independent ordinary differential equations.•Massively parallel GPU programming.•Explicit vectorisation in the case of CPUs.•Synchronous and asynchronous parallelisation.•Adaptive solvers and impact dynamics.
doi_str_mv	10.1016/j.cnsns.2022.106521
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2684208578</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1007570422001551</els_id><sourcerecordid>2684208578</sourcerecordid><originalsourceid>FETCH-LOGICAL-c376t-fb84977c39445007ab999f043ed36e203673fe31de98d82653773efd5ab291803</originalsourceid><addsrcrecordid>eNp9UMFOAjEU3BhNRPQLvDTx6mK33d12Dx4MUTQh0QOcm7J9xZKlhb4Fwt9bxLOnN3kz8_Jmsuy-oKOCFvXTatR69DhilLG0qStWXGSDQgqZCybKy4QpFXklaHmd3SCuaHI1VTnIDrNvIDr2JFiCods7vySadDougfjdegHxxPjgc-ydtY-kC4fcuDV4dMHrjoRonNfxSEyiIYLvXdrCdqf7JCB4xB7WSBKcfM2RaG_IOIHb7MrqDuHubw6z-dvrbPyeTz8nH-OXad5yUfe5XciyEaLlTVlWKYJeNE1jacnB8BoY5bXgFnhhoJFGsrriQnCwptIL1hSS8mH2cL67iWG7A-zVKuxiehwVq2XJqKyETCp-VrUxIEawahPdOqVSBVWnhtVK_TasTg2rc8PJ9Xx2QQqwdxAVtg58C8ZFaHtlgvvX_wOXIYVH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2684208578</pqid></control><display><type>article</type><title>The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs</title><source>Elsevier ScienceDirect Journals</source><creator>Nagy, Dániel ; Plavecz, Lambert ; Hegedűs, Ferenc</creator><creatorcontrib>Nagy, Dániel ; Plavecz, Lambert ; Hegedűs, Ferenc</creatorcontrib><description>This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance. •Solving large number of independent ordinary differential equations.•Massively parallel GPU programming.•Explicit vectorisation in the case of CPUs.•Synchronous and asynchronous parallelisation.•Adaptive solvers and impact dynamics.</description><identifier>ISSN: 1007-5704</identifier><identifier>EISSN: 1878-7274</identifier><identifier>DOI: 10.1016/j.cnsns.2022.106521</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Asynchronous ; C++ (programming language) ; Central processing units ; Computer peripherals ; CPU programming ; CPUs ; Differential equations ; Event handling ; GPU programming ; Graphics processing units ; Hardware ; Lorenz equations ; Microprocessors ; Non-stiff problems ; Optimization ; Ordinary differential equations ; Relief valves ; Runge-Kutta method</subject><ispartof>Communications in nonlinear science & numerical simulation, 2022-09, Vol.112, p.106521, Article 106521</ispartof><rights>2022 The Authors</rights><rights>Copyright Elsevier Science Ltd. Sep 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c376t-fb84977c39445007ab999f043ed36e203673fe31de98d82653773efd5ab291803</citedby><cites>FETCH-LOGICAL-c376t-fb84977c39445007ab999f043ed36e203673fe31de98d82653773efd5ab291803</cites><orcidid>0000-0002-7077-5042 ; 0000-0002-2939-7692</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1007570422001551$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Nagy, Dániel</creatorcontrib><creatorcontrib>Plavecz, Lambert</creatorcontrib><creatorcontrib>Hegedűs, Ferenc</creatorcontrib><title>The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs</title><title>Communications in nonlinear science & numerical simulation</title><description>This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance. •Solving large number of independent ordinary differential equations.•Massively parallel GPU programming.•Explicit vectorisation in the case of CPUs.•Synchronous and asynchronous parallelisation.•Adaptive solvers and impact dynamics.</description><subject>Asynchronous</subject><subject>C++ (programming language)</subject><subject>Central processing units</subject><subject>Computer peripherals</subject><subject>CPU programming</subject><subject>CPUs</subject><subject>Differential equations</subject><subject>Event handling</subject><subject>GPU programming</subject><subject>Graphics processing units</subject><subject>Hardware</subject><subject>Lorenz equations</subject><subject>Microprocessors</subject><subject>Non-stiff problems</subject><subject>Optimization</subject><subject>Ordinary differential equations</subject><subject>Relief valves</subject><subject>Runge-Kutta method</subject><issn>1007-5704</issn><issn>1878-7274</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9UMFOAjEU3BhNRPQLvDTx6mK33d12Dx4MUTQh0QOcm7J9xZKlhb4Fwt9bxLOnN3kz8_Jmsuy-oKOCFvXTatR69DhilLG0qStWXGSDQgqZCybKy4QpFXklaHmd3SCuaHI1VTnIDrNvIDr2JFiCods7vySadDougfjdegHxxPjgc-ydtY-kC4fcuDV4dMHrjoRonNfxSEyiIYLvXdrCdqf7JCB4xB7WSBKcfM2RaG_IOIHb7MrqDuHubw6z-dvrbPyeTz8nH-OXad5yUfe5XciyEaLlTVlWKYJeNE1jacnB8BoY5bXgFnhhoJFGsrriQnCwptIL1hSS8mH2cL67iWG7A-zVKuxiehwVq2XJqKyETCp-VrUxIEawahPdOqVSBVWnhtVK_TasTg2rc8PJ9Xx2QQqwdxAVtg58C8ZFaHtlgvvX_wOXIYVH</recordid><startdate>202209</startdate><enddate>202209</enddate><creator>Nagy, Dániel</creator><creator>Plavecz, Lambert</creator><creator>Hegedűs, Ferenc</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-7077-5042</orcidid><orcidid>https://orcid.org/0000-0002-2939-7692</orcidid></search><sort><creationdate>202209</creationdate><title>The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs</title><author>Nagy, Dániel ; Plavecz, Lambert ; Hegedűs, Ferenc</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c376t-fb84977c39445007ab999f043ed36e203673fe31de98d82653773efd5ab291803</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Asynchronous</topic><topic>C++ (programming language)</topic><topic>Central processing units</topic><topic>Computer peripherals</topic><topic>CPU programming</topic><topic>CPUs</topic><topic>Differential equations</topic><topic>Event handling</topic><topic>GPU programming</topic><topic>Graphics processing units</topic><topic>Hardware</topic><topic>Lorenz equations</topic><topic>Microprocessors</topic><topic>Non-stiff problems</topic><topic>Optimization</topic><topic>Ordinary differential equations</topic><topic>Relief valves</topic><topic>Runge-Kutta method</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nagy, Dániel</creatorcontrib><creatorcontrib>Plavecz, Lambert</creatorcontrib><creatorcontrib>Hegedűs, Ferenc</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><jtitle>Communications in nonlinear science & numerical simulation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nagy, Dániel</au><au>Plavecz, Lambert</au><au>Hegedűs, Ferenc</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs</atitle><jtitle>Communications in nonlinear science & numerical simulation</jtitle><date>2022-09</date><risdate>2022</risdate><volume>112</volume><spage>106521</spage><pages>106521-</pages><artnum>106521</artnum><issn>1007-5704</issn><eissn>1878-7274</eissn><abstract>This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance. •Solving large number of independent ordinary differential equations.•Massively parallel GPU programming.•Explicit vectorisation in the case of CPUs.•Synchronous and asynchronous parallelisation.•Adaptive solvers and impact dynamics.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.cnsns.2022.106521</doi><orcidid>https://orcid.org/0000-0002-7077-5042</orcidid><orcidid>https://orcid.org/0000-0002-2939-7692</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1007-5704
ispartof	Communications in nonlinear science & numerical simulation, 2022-09, Vol.112, p.106521, Article 106521
issn	1007-5704 1878-7274
language	eng
recordid	cdi_proquest_journals_2684208578
source	Elsevier ScienceDirect Journals
subjects	Asynchronous C++ (programming language) Central processing units Computer peripherals CPU programming CPUs Differential equations Event handling GPU programming Graphics processing units Hardware Lorenz equations Microprocessors Non-stiff problems Optimization Ordinary differential equations Relief valves Runge-Kutta method
title	The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T20%3A05%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20art%20of%20solving%20a%20large%20number%20of%20non-stiff,%20low-dimensional%20ordinary%20differential%20equation%20systems%20on%20GPUs%20and%20CPUs&rft.jtitle=Communications%20in%20nonlinear%20science%20&%20numerical%20simulation&rft.au=Nagy,%20D%C3%A1niel&rft.date=2022-09&rft.volume=112&rft.spage=106521&rft.pages=106521-&rft.artnum=106521&rft.issn=1007-5704&rft.eissn=1878-7274&rft_id=info:doi/10.1016/j.cnsns.2022.106521&rft_dat=%3Cproquest_cross%3E2684208578%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2684208578&rft_id=info:pmid/&rft_els_id=S1007570422001551&rfr_iscdi=true