Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in effici...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer physics communications 2015-09, Vol.194, p.18-32
Hauptverfasser:	Nishiura, Daisuke, Furuichi, Mikito, Sakaguchi, Hide
Format:	Artikel
Sprache:	eng
Schlagworte:	Architecture (computers) Central processing units Computation Computer simulation CUDA Devices GPU Graphics processing units Hydrodynamics MIC OpenMP Particle simulation Processors SPH
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	32
container_issue
container_start_page	18
container_title	Computer physics communications
container_volume	194
creator	Nishiura, Daisuke Furuichi, Mikito Sakaguchi, Hide
description	The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
doi_str_mv	10.1016/j.cpc.2015.04.006
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1770292029</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S001046551500137X</els_id><sourcerecordid>1770292029</sourcerecordid><originalsourceid>FETCH-LOGICAL-c509t-31453637ab3d1b7f08dfde8d73d22d99c41eca946d33ab888195433123b198713</originalsourceid><addsrcrecordid>eNp9kLFu2zAQhomgAeo6fYBuHLtIPYqUKKJTYbRJAANZkpmgyVNNgxQVUirgt69sd-5wuOX__sN9hHxhUDNg3bdTbSdbN8DaGkQN0N2RDeulqholxAeyAWBQia5tP5JPpZwAQErFN2TcpTgts5l9Gk2gE-Yh5WhGizQN1NASU5qP6Ohk8uxtQHo8u5zceTTR20KLj0u40nQFaTmajK6KGFM-XxgTAgZqr0f8-PuB3A8mFPz8b2_J26-fr7unav_y-Lz7sa9sC2quOBMt77g0B-7YQQ7Qu8Fh7yR3TeOUsoKhNUp0jnNz6PueqVZwzhp-YKqXjG_J11vvlNP7gmXW0ReLIZgR01I0kxIa1ayzRtktanMqJeOgp-yjyWfNQF_c6pNe3eqLWw1Cr25X5vuNwfWHPx6zLtbjKs35jHbWLvn_0H8BTbiD4g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1770292029</pqid></control><display><type>article</type><title>Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Nishiura, Daisuke ; Furuichi, Mikito ; Sakaguchi, Hide</creator><creatorcontrib>Nishiura, Daisuke ; Furuichi, Mikito ; Sakaguchi, Hide</creatorcontrib><description>The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.</description><identifier>ISSN: 0010-4655</identifier><identifier>EISSN: 1879-2944</identifier><identifier>DOI: 10.1016/j.cpc.2015.04.006</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Architecture (computers) ; Central processing units ; Computation ; Computer simulation ; CUDA ; Devices ; GPU ; Graphics processing units ; Hydrodynamics ; MIC ; OpenMP ; Particle simulation ; Processors ; SPH</subject><ispartof>Computer physics communications, 2015-09, Vol.194, p.18-32</ispartof><rights>2015 The Authors</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c509t-31453637ab3d1b7f08dfde8d73d22d99c41eca946d33ab888195433123b198713</citedby><cites>FETCH-LOGICAL-c509t-31453637ab3d1b7f08dfde8d73d22d99c41eca946d33ab888195433123b198713</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.cpc.2015.04.006$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Nishiura, Daisuke</creatorcontrib><creatorcontrib>Furuichi, Mikito</creatorcontrib><creatorcontrib>Sakaguchi, Hide</creatorcontrib><title>Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing</title><title>Computer physics communications</title><description>The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.</description><subject>Architecture (computers)</subject><subject>Central processing units</subject><subject>Computation</subject><subject>Computer simulation</subject><subject>CUDA</subject><subject>Devices</subject><subject>GPU</subject><subject>Graphics processing units</subject><subject>Hydrodynamics</subject><subject>MIC</subject><subject>OpenMP</subject><subject>Particle simulation</subject><subject>Processors</subject><subject>SPH</subject><issn>0010-4655</issn><issn>1879-2944</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNp9kLFu2zAQhomgAeo6fYBuHLtIPYqUKKJTYbRJAANZkpmgyVNNgxQVUirgt69sd-5wuOX__sN9hHxhUDNg3bdTbSdbN8DaGkQN0N2RDeulqholxAeyAWBQia5tP5JPpZwAQErFN2TcpTgts5l9Gk2gE-Yh5WhGizQN1NASU5qP6Ohk8uxtQHo8u5zceTTR20KLj0u40nQFaTmajK6KGFM-XxgTAgZqr0f8-PuB3A8mFPz8b2_J26-fr7unav_y-Lz7sa9sC2quOBMt77g0B-7YQQ7Qu8Fh7yR3TeOUsoKhNUp0jnNz6PueqVZwzhp-YKqXjG_J11vvlNP7gmXW0ReLIZgR01I0kxIa1ayzRtktanMqJeOgp-yjyWfNQF_c6pNe3eqLWw1Cr25X5vuNwfWHPx6zLtbjKs35jHbWLvn_0H8BTbiD4g</recordid><startdate>20150901</startdate><enddate>20150901</enddate><creator>Nishiura, Daisuke</creator><creator>Furuichi, Mikito</creator><creator>Sakaguchi, Hide</creator><general>Elsevier B.V</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7U5</scope><scope>8FD</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20150901</creationdate><title>Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing</title><author>Nishiura, Daisuke ; Furuichi, Mikito ; Sakaguchi, Hide</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c509t-31453637ab3d1b7f08dfde8d73d22d99c41eca946d33ab888195433123b198713</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Architecture (computers)</topic><topic>Central processing units</topic><topic>Computation</topic><topic>Computer simulation</topic><topic>CUDA</topic><topic>Devices</topic><topic>GPU</topic><topic>Graphics processing units</topic><topic>Hydrodynamics</topic><topic>MIC</topic><topic>OpenMP</topic><topic>Particle simulation</topic><topic>Processors</topic><topic>SPH</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nishiura, Daisuke</creatorcontrib><creatorcontrib>Furuichi, Mikito</creatorcontrib><creatorcontrib>Sakaguchi, Hide</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computer physics communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nishiura, Daisuke</au><au>Furuichi, Mikito</au><au>Sakaguchi, Hide</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing</atitle><jtitle>Computer physics communications</jtitle><date>2015-09-01</date><risdate>2015</risdate><volume>194</volume><spage>18</spage><epage>32</epage><pages>18-32</pages><issn>0010-4655</issn><eissn>1879-2944</eissn><abstract>The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.cpc.2015.04.006</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0010-4655
ispartof	Computer physics communications, 2015-09, Vol.194, p.18-32
issn	0010-4655 1879-2944
language	eng
recordid	cdi_proquest_miscellaneous_1770292029
source	Elsevier ScienceDirect Journals Complete
subjects	Architecture (computers) Central processing units Computation Computer simulation CUDA Devices GPU Graphics processing units Hydrodynamics MIC OpenMP Particle simulation Processors SPH
title	Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T14%3A43%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Computational%20performance%20of%20a%20smoothed%20particle%20hydrodynamics%20simulation%20for%20shared-memory%20parallel%20computing&rft.jtitle=Computer%20physics%20communications&rft.au=Nishiura,%20Daisuke&rft.date=2015-09-01&rft.volume=194&rft.spage=18&rft.epage=32&rft.pages=18-32&rft.issn=0010-4655&rft.eissn=1879-2944&rft_id=info:doi/10.1016/j.cpc.2015.04.006&rft_dat=%3Cproquest_cross%3E1770292029%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1770292029&rft_id=info:pmid/&rft_els_id=S001046551500137X&rfr_iscdi=true