The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs

This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Communications in nonlinear science & numerical simulation 2022-09, Vol.112, p.106521, Article 106521
Hauptverfasser: Nagy, Dániel, Plavecz, Lambert, Hegedűs, Ferenc
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 106521
container_title Communications in nonlinear science & numerical simulation
container_volume 112
creator Nagy, Dániel
Plavecz, Lambert
Hegedűs, Ferenc
description This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance. •Solving large number of independent ordinary differential equations.•Massively parallel GPU programming.•Explicit vectorisation in the case of CPUs.•Synchronous and asynchronous parallelisation.•Adaptive solvers and impact dynamics.
doi_str_mv 10.1016/j.cnsns.2022.106521
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2684208578</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1007570422001551</els_id><sourcerecordid>2684208578</sourcerecordid><originalsourceid>FETCH-LOGICAL-c376t-fb84977c39445007ab999f043ed36e203673fe31de98d82653773efd5ab291803</originalsourceid><addsrcrecordid>eNp9UMFOAjEU3BhNRPQLvDTx6mK33d12Dx4MUTQh0QOcm7J9xZKlhb4Fwt9bxLOnN3kz8_Jmsuy-oKOCFvXTatR69DhilLG0qStWXGSDQgqZCybKy4QpFXklaHmd3SCuaHI1VTnIDrNvIDr2JFiCods7vySadDougfjdegHxxPjgc-ydtY-kC4fcuDV4dMHrjoRonNfxSEyiIYLvXdrCdqf7JCB4xB7WSBKcfM2RaG_IOIHb7MrqDuHubw6z-dvrbPyeTz8nH-OXad5yUfe5XciyEaLlTVlWKYJeNE1jacnB8BoY5bXgFnhhoJFGsrriQnCwptIL1hSS8mH2cL67iWG7A-zVKuxiehwVq2XJqKyETCp-VrUxIEawahPdOqVSBVWnhtVK_TasTg2rc8PJ9Xx2QQqwdxAVtg58C8ZFaHtlgvvX_wOXIYVH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2684208578</pqid></control><display><type>article</type><title>The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs</title><source>Elsevier ScienceDirect Journals</source><creator>Nagy, Dániel ; Plavecz, Lambert ; Hegedűs, Ferenc</creator><creatorcontrib>Nagy, Dániel ; Plavecz, Lambert ; Hegedűs, Ferenc</creatorcontrib><description>This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance. •Solving large number of independent ordinary differential equations.•Massively parallel GPU programming.•Explicit vectorisation in the case of CPUs.•Synchronous and asynchronous parallelisation.•Adaptive solvers and impact dynamics.</description><identifier>ISSN: 1007-5704</identifier><identifier>EISSN: 1878-7274</identifier><identifier>DOI: 10.1016/j.cnsns.2022.106521</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Asynchronous ; C++ (programming language) ; Central processing units ; Computer peripherals ; CPU programming ; CPUs ; Differential equations ; Event handling ; GPU programming ; Graphics processing units ; Hardware ; Lorenz equations ; Microprocessors ; Non-stiff problems ; Optimization ; Ordinary differential equations ; Relief valves ; Runge-Kutta method</subject><ispartof>Communications in nonlinear science &amp; numerical simulation, 2022-09, Vol.112, p.106521, Article 106521</ispartof><rights>2022 The Authors</rights><rights>Copyright Elsevier Science Ltd. Sep 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c376t-fb84977c39445007ab999f043ed36e203673fe31de98d82653773efd5ab291803</citedby><cites>FETCH-LOGICAL-c376t-fb84977c39445007ab999f043ed36e203673fe31de98d82653773efd5ab291803</cites><orcidid>0000-0002-7077-5042 ; 0000-0002-2939-7692</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1007570422001551$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Nagy, Dániel</creatorcontrib><creatorcontrib>Plavecz, Lambert</creatorcontrib><creatorcontrib>Hegedűs, Ferenc</creatorcontrib><title>The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs</title><title>Communications in nonlinear science &amp; numerical simulation</title><description>This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance. •Solving large number of independent ordinary differential equations.•Massively parallel GPU programming.•Explicit vectorisation in the case of CPUs.•Synchronous and asynchronous parallelisation.•Adaptive solvers and impact dynamics.</description><subject>Asynchronous</subject><subject>C++ (programming language)</subject><subject>Central processing units</subject><subject>Computer peripherals</subject><subject>CPU programming</subject><subject>CPUs</subject><subject>Differential equations</subject><subject>Event handling</subject><subject>GPU programming</subject><subject>Graphics processing units</subject><subject>Hardware</subject><subject>Lorenz equations</subject><subject>Microprocessors</subject><subject>Non-stiff problems</subject><subject>Optimization</subject><subject>Ordinary differential equations</subject><subject>Relief valves</subject><subject>Runge-Kutta method</subject><issn>1007-5704</issn><issn>1878-7274</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9UMFOAjEU3BhNRPQLvDTx6mK33d12Dx4MUTQh0QOcm7J9xZKlhb4Fwt9bxLOnN3kz8_Jmsuy-oKOCFvXTatR69DhilLG0qStWXGSDQgqZCybKy4QpFXklaHmd3SCuaHI1VTnIDrNvIDr2JFiCods7vySadDougfjdegHxxPjgc-ydtY-kC4fcuDV4dMHrjoRonNfxSEyiIYLvXdrCdqf7JCB4xB7WSBKcfM2RaG_IOIHb7MrqDuHubw6z-dvrbPyeTz8nH-OXad5yUfe5XciyEaLlTVlWKYJeNE1jacnB8BoY5bXgFnhhoJFGsrriQnCwptIL1hSS8mH2cL67iWG7A-zVKuxiehwVq2XJqKyETCp-VrUxIEawahPdOqVSBVWnhtVK_TasTg2rc8PJ9Xx2QQqwdxAVtg58C8ZFaHtlgvvX_wOXIYVH</recordid><startdate>202209</startdate><enddate>202209</enddate><creator>Nagy, Dániel</creator><creator>Plavecz, Lambert</creator><creator>Hegedűs, Ferenc</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-7077-5042</orcidid><orcidid>https://orcid.org/0000-0002-2939-7692</orcidid></search><sort><creationdate>202209</creationdate><title>The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs</title><author>Nagy, Dániel ; Plavecz, Lambert ; Hegedűs, Ferenc</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c376t-fb84977c39445007ab999f043ed36e203673fe31de98d82653773efd5ab291803</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Asynchronous</topic><topic>C++ (programming language)</topic><topic>Central processing units</topic><topic>Computer peripherals</topic><topic>CPU programming</topic><topic>CPUs</topic><topic>Differential equations</topic><topic>Event handling</topic><topic>GPU programming</topic><topic>Graphics processing units</topic><topic>Hardware</topic><topic>Lorenz equations</topic><topic>Microprocessors</topic><topic>Non-stiff problems</topic><topic>Optimization</topic><topic>Ordinary differential equations</topic><topic>Relief valves</topic><topic>Runge-Kutta method</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nagy, Dániel</creatorcontrib><creatorcontrib>Plavecz, Lambert</creatorcontrib><creatorcontrib>Hegedűs, Ferenc</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><jtitle>Communications in nonlinear science &amp; numerical simulation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nagy, Dániel</au><au>Plavecz, Lambert</au><au>Hegedűs, Ferenc</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs</atitle><jtitle>Communications in nonlinear science &amp; numerical simulation</jtitle><date>2022-09</date><risdate>2022</risdate><volume>112</volume><spage>106521</spage><pages>106521-</pages><artnum>106521</artnum><issn>1007-5704</issn><eissn>1878-7274</eissn><abstract>This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance. •Solving large number of independent ordinary differential equations.•Massively parallel GPU programming.•Explicit vectorisation in the case of CPUs.•Synchronous and asynchronous parallelisation.•Adaptive solvers and impact dynamics.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.cnsns.2022.106521</doi><orcidid>https://orcid.org/0000-0002-7077-5042</orcidid><orcidid>https://orcid.org/0000-0002-2939-7692</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1007-5704
ispartof Communications in nonlinear science & numerical simulation, 2022-09, Vol.112, p.106521, Article 106521
issn 1007-5704
1878-7274
language eng
recordid cdi_proquest_journals_2684208578
source Elsevier ScienceDirect Journals
subjects Asynchronous
C++ (programming language)
Central processing units
Computer peripherals
CPU programming
CPUs
Differential equations
Event handling
GPU programming
Graphics processing units
Hardware
Lorenz equations
Microprocessors
Non-stiff problems
Optimization
Ordinary differential equations
Relief valves
Runge-Kutta method
title The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T20%3A05%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20art%20of%20solving%20a%20large%20number%20of%20non-stiff,%20low-dimensional%20ordinary%20differential%20equation%20systems%20on%20GPUs%20and%20CPUs&rft.jtitle=Communications%20in%20nonlinear%20science%20&%20numerical%20simulation&rft.au=Nagy,%20D%C3%A1niel&rft.date=2022-09&rft.volume=112&rft.spage=106521&rft.pages=106521-&rft.artnum=106521&rft.issn=1007-5704&rft.eissn=1878-7274&rft_id=info:doi/10.1016/j.cnsns.2022.106521&rft_dat=%3Cproquest_cross%3E2684208578%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2684208578&rft_id=info:pmid/&rft_els_id=S1007570422001551&rfr_iscdi=true