The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs
This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power...
Gespeichert in:
Veröffentlicht in: | Communications in nonlinear science & numerical simulation 2022-09, Vol.112, p.106521, Article 106521 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 106521 |
container_title | Communications in nonlinear science & numerical simulation |
container_volume | 112 |
creator | Nagy, Dániel Plavecz, Lambert Hegedűs, Ferenc |
description | This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance.
•Solving large number of independent ordinary differential equations.•Massively parallel GPU programming.•Explicit vectorisation in the case of CPUs.•Synchronous and asynchronous parallelisation.•Adaptive solvers and impact dynamics. |
doi_str_mv | 10.1016/j.cnsns.2022.106521 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2684208578</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1007570422001551</els_id><sourcerecordid>2684208578</sourcerecordid><originalsourceid>FETCH-LOGICAL-c376t-fb84977c39445007ab999f043ed36e203673fe31de98d82653773efd5ab291803</originalsourceid><addsrcrecordid>eNp9UMFOAjEU3BhNRPQLvDTx6mK33d12Dx4MUTQh0QOcm7J9xZKlhb4Fwt9bxLOnN3kz8_Jmsuy-oKOCFvXTatR69DhilLG0qStWXGSDQgqZCybKy4QpFXklaHmd3SCuaHI1VTnIDrNvIDr2JFiCods7vySadDougfjdegHxxPjgc-ydtY-kC4fcuDV4dMHrjoRonNfxSEyiIYLvXdrCdqf7JCB4xB7WSBKcfM2RaG_IOIHb7MrqDuHubw6z-dvrbPyeTz8nH-OXad5yUfe5XciyEaLlTVlWKYJeNE1jacnB8BoY5bXgFnhhoJFGsrriQnCwptIL1hSS8mH2cL67iWG7A-zVKuxiehwVq2XJqKyETCp-VrUxIEawahPdOqVSBVWnhtVK_TasTg2rc8PJ9Xx2QQqwdxAVtg58C8ZFaHtlgvvX_wOXIYVH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2684208578</pqid></control><display><type>article</type><title>The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs</title><source>Elsevier ScienceDirect Journals</source><creator>Nagy, Dániel ; Plavecz, Lambert ; Hegedűs, Ferenc</creator><creatorcontrib>Nagy, Dániel ; Plavecz, Lambert ; Hegedűs, Ferenc</creatorcontrib><description>This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance.
•Solving large number of independent ordinary differential equations.•Massively parallel GPU programming.•Explicit vectorisation in the case of CPUs.•Synchronous and asynchronous parallelisation.•Adaptive solvers and impact dynamics.</description><identifier>ISSN: 1007-5704</identifier><identifier>EISSN: 1878-7274</identifier><identifier>DOI: 10.1016/j.cnsns.2022.106521</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Asynchronous ; C++ (programming language) ; Central processing units ; Computer peripherals ; CPU programming ; CPUs ; Differential equations ; Event handling ; GPU programming ; Graphics processing units ; Hardware ; Lorenz equations ; Microprocessors ; Non-stiff problems ; Optimization ; Ordinary differential equations ; Relief valves ; Runge-Kutta method</subject><ispartof>Communications in nonlinear science & numerical simulation, 2022-09, Vol.112, p.106521, Article 106521</ispartof><rights>2022 The Authors</rights><rights>Copyright Elsevier Science Ltd. Sep 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c376t-fb84977c39445007ab999f043ed36e203673fe31de98d82653773efd5ab291803</citedby><cites>FETCH-LOGICAL-c376t-fb84977c39445007ab999f043ed36e203673fe31de98d82653773efd5ab291803</cites><orcidid>0000-0002-7077-5042 ; 0000-0002-2939-7692</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1007570422001551$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Nagy, Dániel</creatorcontrib><creatorcontrib>Plavecz, Lambert</creatorcontrib><creatorcontrib>Hegedűs, Ferenc</creatorcontrib><title>The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs</title><title>Communications in nonlinear science & numerical simulation</title><description>This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance.
•Solving large number of independent ordinary differential equations.•Massively parallel GPU programming.•Explicit vectorisation in the case of CPUs.•Synchronous and asynchronous parallelisation.•Adaptive solvers and impact dynamics.</description><subject>Asynchronous</subject><subject>C++ (programming language)</subject><subject>Central processing units</subject><subject>Computer peripherals</subject><subject>CPU programming</subject><subject>CPUs</subject><subject>Differential equations</subject><subject>Event handling</subject><subject>GPU programming</subject><subject>Graphics processing units</subject><subject>Hardware</subject><subject>Lorenz equations</subject><subject>Microprocessors</subject><subject>Non-stiff problems</subject><subject>Optimization</subject><subject>Ordinary differential equations</subject><subject>Relief valves</subject><subject>Runge-Kutta method</subject><issn>1007-5704</issn><issn>1878-7274</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9UMFOAjEU3BhNRPQLvDTx6mK33d12Dx4MUTQh0QOcm7J9xZKlhb4Fwt9bxLOnN3kz8_Jmsuy-oKOCFvXTatR69DhilLG0qStWXGSDQgqZCybKy4QpFXklaHmd3SCuaHI1VTnIDrNvIDr2JFiCods7vySadDougfjdegHxxPjgc-ydtY-kC4fcuDV4dMHrjoRonNfxSEyiIYLvXdrCdqf7JCB4xB7WSBKcfM2RaG_IOIHb7MrqDuHubw6z-dvrbPyeTz8nH-OXad5yUfe5XciyEaLlTVlWKYJeNE1jacnB8BoY5bXgFnhhoJFGsrriQnCwptIL1hSS8mH2cL67iWG7A-zVKuxiehwVq2XJqKyETCp-VrUxIEawahPdOqVSBVWnhtVK_TasTg2rc8PJ9Xx2QQqwdxAVtg58C8ZFaHtlgvvX_wOXIYVH</recordid><startdate>202209</startdate><enddate>202209</enddate><creator>Nagy, Dániel</creator><creator>Plavecz, Lambert</creator><creator>Hegedűs, Ferenc</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-7077-5042</orcidid><orcidid>https://orcid.org/0000-0002-2939-7692</orcidid></search><sort><creationdate>202209</creationdate><title>The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs</title><author>Nagy, Dániel ; Plavecz, Lambert ; Hegedűs, Ferenc</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c376t-fb84977c39445007ab999f043ed36e203673fe31de98d82653773efd5ab291803</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Asynchronous</topic><topic>C++ (programming language)</topic><topic>Central processing units</topic><topic>Computer peripherals</topic><topic>CPU programming</topic><topic>CPUs</topic><topic>Differential equations</topic><topic>Event handling</topic><topic>GPU programming</topic><topic>Graphics processing units</topic><topic>Hardware</topic><topic>Lorenz equations</topic><topic>Microprocessors</topic><topic>Non-stiff problems</topic><topic>Optimization</topic><topic>Ordinary differential equations</topic><topic>Relief valves</topic><topic>Runge-Kutta method</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nagy, Dániel</creatorcontrib><creatorcontrib>Plavecz, Lambert</creatorcontrib><creatorcontrib>Hegedűs, Ferenc</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><jtitle>Communications in nonlinear science & numerical simulation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nagy, Dániel</au><au>Plavecz, Lambert</au><au>Hegedűs, Ferenc</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs</atitle><jtitle>Communications in nonlinear science & numerical simulation</jtitle><date>2022-09</date><risdate>2022</risdate><volume>112</volume><spage>106521</spage><pages>106521-</pages><artnum>106521</artnum><issn>1007-5704</issn><eissn>1878-7274</eissn><abstract>This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance.
•Solving large number of independent ordinary differential equations.•Massively parallel GPU programming.•Explicit vectorisation in the case of CPUs.•Synchronous and asynchronous parallelisation.•Adaptive solvers and impact dynamics.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.cnsns.2022.106521</doi><orcidid>https://orcid.org/0000-0002-7077-5042</orcidid><orcidid>https://orcid.org/0000-0002-2939-7692</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1007-5704 |
ispartof | Communications in nonlinear science & numerical simulation, 2022-09, Vol.112, p.106521, Article 106521 |
issn | 1007-5704 1878-7274 |
language | eng |
recordid | cdi_proquest_journals_2684208578 |
source | Elsevier ScienceDirect Journals |
subjects | Asynchronous C++ (programming language) Central processing units Computer peripherals CPU programming CPUs Differential equations Event handling GPU programming Graphics processing units Hardware Lorenz equations Microprocessors Non-stiff problems Optimization Ordinary differential equations Relief valves Runge-Kutta method |
title | The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T20%3A05%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20art%20of%20solving%20a%20large%20number%20of%20non-stiff,%20low-dimensional%20ordinary%20differential%20equation%20systems%20on%20GPUs%20and%20CPUs&rft.jtitle=Communications%20in%20nonlinear%20science%20&%20numerical%20simulation&rft.au=Nagy,%20D%C3%A1niel&rft.date=2022-09&rft.volume=112&rft.spage=106521&rft.pages=106521-&rft.artnum=106521&rft.issn=1007-5704&rft.eissn=1878-7274&rft_id=info:doi/10.1016/j.cnsns.2022.106521&rft_dat=%3Cproquest_cross%3E2684208578%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2684208578&rft_id=info:pmid/&rft_els_id=S1007570422001551&rfr_iscdi=true |