Generation of large finite-element matrices on multiple graphics processors

SUMMARYThis paper presents techniques for generating very large finite‐element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal for numerical methods in engineering 2013-04, Vol.94 (2), p.204-220
Hauptverfasser:	Dziekonski, A., Sypek, P., Lamecki, A., Mrozowski, M.
Format:	Artikel
Sprache:	eng
Schlagworte:	Boards Central processing units Exact sciences and technology Fermi Finite element method Mathematical analysis Mathematics matrix generation Methods of scientific computing (including symbolic computation, algebraic computation) multicore CPUs multiple GPUs Numerical analysis. Scientific computation parallel computing Processors Random access memory Reproduction Sciences and techniques of general use Workstations
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	220
container_issue	2
container_start_page	204
container_title	International journal for numerical methods in engineering
container_volume	94
creator	Dziekonski, A. Sypek, P. Lamecki, A. Mrozowski, M.
description	SUMMARYThis paper presents techniques for generating very large finite‐element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite‐element analysis in an iterative manner on several GPUs and to use the graphics accelerators concurrently with CPUs performing collection and addition of the matrix fragments using a fast multithreaded procedure. The scheduling of the threads is organized in such a way that the CPU operations do not affect the performance of the process, and the GPUs are idle only when data are being transferred from GPU to CPU. This approach is verified on two workstations: the first consists of two 6‐core Intel Xeon X5690 processors with two Fermi GPUs: each GPU is a GeForce GTX 590 with two graphics processors and 1.5 GB of fast RAM; the second workstation is equipped with two Tesla C2075 boards carrying 6 GB of RAM each and two 12‐core Opteron 6174s. For the latter setup, we demonstrate the fast generation of sparse finite‐element matrices as large as 10 million unknowns, with over 1 billion nonzero entries.Comparing with the single‐threaded and multithreaded CPU implementations, the GPU‐based version of the algorithm based on the ideas presented in this paper reduces the finite‐element matrix‐generation time in double precision by factors of 100 and 30, respectively. Copyright © 2012 John Wiley & Sons, Ltd.
doi_str_mv	10.1002/nme.4452
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671412855</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2925905911</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4272-97f2d3dfca114fc9bc2a2bb2ed9eaff1817525b1755b2a2cc9ea55f09f71fbbb3</originalsourceid><addsrcrecordid>eNp1kNtKAzEURYMoWKvgJwyI4MtoTmZiOo9atVqvoCL4EjLpiUYzF5Mp2r830lJB8CUHshfrJJuQbaD7QCk7qCvcz3POVkgPaCFSyqhYJb0YFSkvBrBONkJ4oxSA06xHLkdYo1edbeqkMYlT_gUTY2vbYYoOK6y7pFKdtxpDEplq6jrbOkxevGpfrQ5J65uYhcaHTbJmlAu4tZh98nh2-jA8T69uRxfDo6tU50ywtBCGTbKJ0QogN7ooNVOsLBlOClTGwAAEZ7yMJy9jonW85tzQwggwZVlmfbI398bVH1MMnaxs0OicqrGZBgmHAnJgA84juvMHfWumvo6vk5BBwVhGM_Er1L4JwaORrbeV8jMJVP60KmOr8qfViO4uhCpo5YxXtbZhycf_idjrIHLpnPu0Dmf_-uTN9enCu-Bt6PBrySv_Lg9FJrh8uhnJu_vx8RieTyTNvgH65ZUs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1319223037</pqid></control><display><type>article</type><title>Generation of large finite-element matrices on multiple graphics processors</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Dziekonski, A. ; Sypek, P. ; Lamecki, A. ; Mrozowski, M.</creator><creatorcontrib>Dziekonski, A. ; Sypek, P. ; Lamecki, A. ; Mrozowski, M.</creatorcontrib><description>SUMMARYThis paper presents techniques for generating very large finite‐element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite‐element analysis in an iterative manner on several GPUs and to use the graphics accelerators concurrently with CPUs performing collection and addition of the matrix fragments using a fast multithreaded procedure. The scheduling of the threads is organized in such a way that the CPU operations do not affect the performance of the process, and the GPUs are idle only when data are being transferred from GPU to CPU. This approach is verified on two workstations: the first consists of two 6‐core Intel Xeon X5690 processors with two Fermi GPUs: each GPU is a GeForce GTX 590 with two graphics processors and 1.5 GB of fast RAM; the second workstation is equipped with two Tesla C2075 boards carrying 6 GB of RAM each and two 12‐core Opteron 6174s. For the latter setup, we demonstrate the fast generation of sparse finite‐element matrices as large as 10 million unknowns, with over 1 billion nonzero entries.Comparing with the single‐threaded and multithreaded CPU implementations, the GPU‐based version of the algorithm based on the ideas presented in this paper reduces the finite‐element matrix‐generation time in double precision by factors of 100 and 30, respectively. Copyright © 2012 John Wiley & Sons, Ltd.</description><identifier>ISSN: 0029-5981</identifier><identifier>EISSN: 1097-0207</identifier><identifier>DOI: 10.1002/nme.4452</identifier><identifier>CODEN: IJNMBH</identifier><language>eng</language><publisher>Chichester: Blackwell Publishing Ltd</publisher><subject>Boards ; Central processing units ; Exact sciences and technology ; Fermi ; Finite element method ; Mathematical analysis ; Mathematics ; matrix generation ; Methods of scientific computing (including symbolic computation, algebraic computation) ; multicore CPUs ; multiple GPUs ; Numerical analysis. Scientific computation ; parallel computing ; Processors ; Random access memory ; Reproduction ; Sciences and techniques of general use ; Workstations</subject><ispartof>International journal for numerical methods in engineering, 2013-04, Vol.94 (2), p.204-220</ispartof><rights>Copyright © 2012 John Wiley & Sons, Ltd.</rights><rights>2014 INIST-CNRS</rights><rights>Copyright © 2013 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4272-97f2d3dfca114fc9bc2a2bb2ed9eaff1817525b1755b2a2cc9ea55f09f71fbbb3</citedby><cites>FETCH-LOGICAL-c4272-97f2d3dfca114fc9bc2a2bb2ed9eaff1817525b1755b2a2cc9ea55f09f71fbbb3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fnme.4452$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fnme.4452$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=27275038$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Dziekonski, A.</creatorcontrib><creatorcontrib>Sypek, P.</creatorcontrib><creatorcontrib>Lamecki, A.</creatorcontrib><creatorcontrib>Mrozowski, M.</creatorcontrib><title>Generation of large finite-element matrices on multiple graphics processors</title><title>International journal for numerical methods in engineering</title><addtitle>Int. J. Numer. Meth. Engng</addtitle><description>SUMMARYThis paper presents techniques for generating very large finite‐element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite‐element analysis in an iterative manner on several GPUs and to use the graphics accelerators concurrently with CPUs performing collection and addition of the matrix fragments using a fast multithreaded procedure. The scheduling of the threads is organized in such a way that the CPU operations do not affect the performance of the process, and the GPUs are idle only when data are being transferred from GPU to CPU. This approach is verified on two workstations: the first consists of two 6‐core Intel Xeon X5690 processors with two Fermi GPUs: each GPU is a GeForce GTX 590 with two graphics processors and 1.5 GB of fast RAM; the second workstation is equipped with two Tesla C2075 boards carrying 6 GB of RAM each and two 12‐core Opteron 6174s. For the latter setup, we demonstrate the fast generation of sparse finite‐element matrices as large as 10 million unknowns, with over 1 billion nonzero entries.Comparing with the single‐threaded and multithreaded CPU implementations, the GPU‐based version of the algorithm based on the ideas presented in this paper reduces the finite‐element matrix‐generation time in double precision by factors of 100 and 30, respectively. Copyright © 2012 John Wiley & Sons, Ltd.</description><subject>Boards</subject><subject>Central processing units</subject><subject>Exact sciences and technology</subject><subject>Fermi</subject><subject>Finite element method</subject><subject>Mathematical analysis</subject><subject>Mathematics</subject><subject>matrix generation</subject><subject>Methods of scientific computing (including symbolic computation, algebraic computation)</subject><subject>multicore CPUs</subject><subject>multiple GPUs</subject><subject>Numerical analysis. Scientific computation</subject><subject>parallel computing</subject><subject>Processors</subject><subject>Random access memory</subject><subject>Reproduction</subject><subject>Sciences and techniques of general use</subject><subject>Workstations</subject><issn>0029-5981</issn><issn>1097-0207</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNp1kNtKAzEURYMoWKvgJwyI4MtoTmZiOo9atVqvoCL4EjLpiUYzF5Mp2r830lJB8CUHshfrJJuQbaD7QCk7qCvcz3POVkgPaCFSyqhYJb0YFSkvBrBONkJ4oxSA06xHLkdYo1edbeqkMYlT_gUTY2vbYYoOK6y7pFKdtxpDEplq6jrbOkxevGpfrQ5J65uYhcaHTbJmlAu4tZh98nh2-jA8T69uRxfDo6tU50ywtBCGTbKJ0QogN7ooNVOsLBlOClTGwAAEZ7yMJy9jonW85tzQwggwZVlmfbI398bVH1MMnaxs0OicqrGZBgmHAnJgA84juvMHfWumvo6vk5BBwVhGM_Er1L4JwaORrbeV8jMJVP60KmOr8qfViO4uhCpo5YxXtbZhycf_idjrIHLpnPu0Dmf_-uTN9enCu-Bt6PBrySv_Lg9FJrh8uhnJu_vx8RieTyTNvgH65ZUs</recordid><startdate>20130413</startdate><enddate>20130413</enddate><creator>Dziekonski, A.</creator><creator>Sypek, P.</creator><creator>Lamecki, A.</creator><creator>Mrozowski, M.</creator><general>Blackwell Publishing Ltd</general><general>Wiley</general><general>Wiley Subscription Services, Inc</general><scope>BSCLL</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20130413</creationdate><title>Generation of large finite-element matrices on multiple graphics processors</title><author>Dziekonski, A. ; Sypek, P. ; Lamecki, A. ; Mrozowski, M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4272-97f2d3dfca114fc9bc2a2bb2ed9eaff1817525b1755b2a2cc9ea55f09f71fbbb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Boards</topic><topic>Central processing units</topic><topic>Exact sciences and technology</topic><topic>Fermi</topic><topic>Finite element method</topic><topic>Mathematical analysis</topic><topic>Mathematics</topic><topic>matrix generation</topic><topic>Methods of scientific computing (including symbolic computation, algebraic computation)</topic><topic>multicore CPUs</topic><topic>multiple GPUs</topic><topic>Numerical analysis. Scientific computation</topic><topic>parallel computing</topic><topic>Processors</topic><topic>Random access memory</topic><topic>Reproduction</topic><topic>Sciences and techniques of general use</topic><topic>Workstations</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dziekonski, A.</creatorcontrib><creatorcontrib>Sypek, P.</creatorcontrib><creatorcontrib>Lamecki, A.</creatorcontrib><creatorcontrib>Mrozowski, M.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>International journal for numerical methods in engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dziekonski, A.</au><au>Sypek, P.</au><au>Lamecki, A.</au><au>Mrozowski, M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generation of large finite-element matrices on multiple graphics processors</atitle><jtitle>International journal for numerical methods in engineering</jtitle><addtitle>Int. J. Numer. Meth. Engng</addtitle><date>2013-04-13</date><risdate>2013</risdate><volume>94</volume><issue>2</issue><spage>204</spage><epage>220</epage><pages>204-220</pages><issn>0029-5981</issn><eissn>1097-0207</eissn><coden>IJNMBH</coden><abstract>SUMMARYThis paper presents techniques for generating very large finite‐element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite‐element analysis in an iterative manner on several GPUs and to use the graphics accelerators concurrently with CPUs performing collection and addition of the matrix fragments using a fast multithreaded procedure. The scheduling of the threads is organized in such a way that the CPU operations do not affect the performance of the process, and the GPUs are idle only when data are being transferred from GPU to CPU. This approach is verified on two workstations: the first consists of two 6‐core Intel Xeon X5690 processors with two Fermi GPUs: each GPU is a GeForce GTX 590 with two graphics processors and 1.5 GB of fast RAM; the second workstation is equipped with two Tesla C2075 boards carrying 6 GB of RAM each and two 12‐core Opteron 6174s. For the latter setup, we demonstrate the fast generation of sparse finite‐element matrices as large as 10 million unknowns, with over 1 billion nonzero entries.Comparing with the single‐threaded and multithreaded CPU implementations, the GPU‐based version of the algorithm based on the ideas presented in this paper reduces the finite‐element matrix‐generation time in double precision by factors of 100 and 30, respectively. Copyright © 2012 John Wiley & Sons, Ltd.</abstract><cop>Chichester</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1002/nme.4452</doi><tpages>17</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0029-5981
ispartof	International journal for numerical methods in engineering, 2013-04, Vol.94 (2), p.204-220
issn	0029-5981 1097-0207
language	eng
recordid	cdi_proquest_miscellaneous_1671412855
source	Wiley Online Library Journals Frontfile Complete
subjects	Boards Central processing units Exact sciences and technology Fermi Finite element method Mathematical analysis Mathematics matrix generation Methods of scientific computing (including symbolic computation, algebraic computation) multicore CPUs multiple GPUs Numerical analysis. Scientific computation parallel computing Processors Random access memory Reproduction Sciences and techniques of general use Workstations
title	Generation of large finite-element matrices on multiple graphics processors
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T08%3A45%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generation%20of%20large%20finite-element%20matrices%20on%20multiple%20graphics%20processors&rft.jtitle=International%20journal%20for%20numerical%20methods%20in%20engineering&rft.au=Dziekonski,%20A.&rft.date=2013-04-13&rft.volume=94&rft.issue=2&rft.spage=204&rft.epage=220&rft.pages=204-220&rft.issn=0029-5981&rft.eissn=1097-0207&rft.coden=IJNMBH&rft_id=info:doi/10.1002/nme.4452&rft_dat=%3Cproquest_cross%3E2925905911%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1319223037&rft_id=info:pmid/&rfr_iscdi=true