Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators

We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Differen...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on mathematical software 2019-08, Vol.45 (3), p.1-40, Article 29
Hauptverfasser:	Kronbichler, Martin, Kormann, Katharina
Format:	Artikel
Sprache:	eng
Schlagworte:	Architectures Computer systems organization Differential equations Mathematical analysis Mathematical software Mathematical software performance Mathematics of computing Multicore architectures Parallel architectures Partial differential equations Solvers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	40
container_issue	3
container_start_page	1
container_title	ACM transactions on mathematical software
container_volume	45
creator	Kronbichler, Martin Kormann, Katharina
description	We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.
doi_str_mv	10.1145/3325864
format	Article
fullrecord	<record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3325864</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3325864</sourcerecordid><originalsourceid>FETCH-LOGICAL-a244t-e8bf27bc7466b719f8e2c087c19bb15130a401464821a4e3270fcbb24d69279e3</originalsourceid><addsrcrecordid>eNo90D1PwzAYBGALgUQoiJ3JG1Pgff0RJyMqTUEqdIE5so0tGdKksh0E_56iFqYb7tENR8glwg2ikLecM1lX4ogUKKUqFWvkMSkAmrrkEuCUnKX0DgAMFRbkudUp0yedY_gq2-gcXXzqftI5jAMdPb0PyY5DDsM0Tokude_iRxhoG4aQd7Z3Gzdkut66qPMY0zk58bpP7uKQM_LaLl7mD-VqvXyc361KzYTIpauNZ8pYJarKKGx87ZiFWllsjEGJHLQAFJWoGWrhOFPgrTFMvFUNU43jM3K937VxTCk6321j2Oj43SF0vzd0hxt28movtd38o7_yByiDVuI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators</title><source>ACM Digital Library Complete</source><creator>Kronbichler, Martin ; Kormann, Katharina</creator><creatorcontrib>Kronbichler, Martin ; Kormann, Katharina</creatorcontrib><description>We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.</description><identifier>ISSN: 0098-3500</identifier><identifier>EISSN: 1557-7295</identifier><identifier>DOI: 10.1145/3325864</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Architectures ; Computer systems organization ; Differential equations ; Mathematical analysis ; Mathematical software ; Mathematical software performance ; Mathematics of computing ; Multicore architectures ; Parallel architectures ; Partial differential equations ; Solvers</subject><ispartof>ACM transactions on mathematical software, 2019-08, Vol.45 (3), p.1-40, Article 29</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a244t-e8bf27bc7466b719f8e2c087c19bb15130a401464821a4e3270fcbb24d69279e3</citedby><cites>FETCH-LOGICAL-a244t-e8bf27bc7466b719f8e2c087c19bb15130a401464821a4e3270fcbb24d69279e3</cites><orcidid>0000-0001-8406-835X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3325864$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2282,27924,27925,40196,76228</link.rule.ids></links><search><creatorcontrib>Kronbichler, Martin</creatorcontrib><creatorcontrib>Kormann, Katharina</creatorcontrib><title>Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators</title><title>ACM transactions on mathematical software</title><addtitle>ACM TOMS</addtitle><description>We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.</description><subject>Architectures</subject><subject>Computer systems organization</subject><subject>Differential equations</subject><subject>Mathematical analysis</subject><subject>Mathematical software</subject><subject>Mathematical software performance</subject><subject>Mathematics of computing</subject><subject>Multicore architectures</subject><subject>Parallel architectures</subject><subject>Partial differential equations</subject><subject>Solvers</subject><issn>0098-3500</issn><issn>1557-7295</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNo90D1PwzAYBGALgUQoiJ3JG1Pgff0RJyMqTUEqdIE5so0tGdKksh0E_56iFqYb7tENR8glwg2ikLecM1lX4ogUKKUqFWvkMSkAmrrkEuCUnKX0DgAMFRbkudUp0yedY_gq2-gcXXzqftI5jAMdPb0PyY5DDsM0Tokude_iRxhoG4aQd7Z3Gzdkut66qPMY0zk58bpP7uKQM_LaLl7mD-VqvXyc361KzYTIpauNZ8pYJarKKGx87ZiFWllsjEGJHLQAFJWoGWrhOFPgrTFMvFUNU43jM3K937VxTCk6321j2Oj43SF0vzd0hxt28movtd38o7_yByiDVuI</recordid><startdate>20190801</startdate><enddate>20190801</enddate><creator>Kronbichler, Martin</creator><creator>Kormann, Katharina</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-8406-835X</orcidid></search><sort><creationdate>20190801</creationdate><title>Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators</title><author>Kronbichler, Martin ; Kormann, Katharina</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a244t-e8bf27bc7466b719f8e2c087c19bb15130a401464821a4e3270fcbb24d69279e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Architectures</topic><topic>Computer systems organization</topic><topic>Differential equations</topic><topic>Mathematical analysis</topic><topic>Mathematical software</topic><topic>Mathematical software performance</topic><topic>Mathematics of computing</topic><topic>Multicore architectures</topic><topic>Parallel architectures</topic><topic>Partial differential equations</topic><topic>Solvers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kronbichler, Martin</creatorcontrib><creatorcontrib>Kormann, Katharina</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on mathematical software</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kronbichler, Martin</au><au>Kormann, Katharina</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators</atitle><jtitle>ACM transactions on mathematical software</jtitle><stitle>ACM TOMS</stitle><date>2019-08-01</date><risdate>2019</risdate><volume>45</volume><issue>3</issue><spage>1</spage><epage>40</epage><pages>1-40</pages><artnum>29</artnum><issn>0098-3500</issn><eissn>1557-7295</eissn><abstract>We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3325864</doi><tpages>40</tpages><orcidid>https://orcid.org/0000-0001-8406-835X</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0098-3500
ispartof	ACM transactions on mathematical software, 2019-08, Vol.45 (3), p.1-40, Article 29
issn	0098-3500 1557-7295
language	eng
recordid	cdi_crossref_primary_10_1145_3325864
source	ACM Digital Library Complete
subjects	Architectures Computer systems organization Differential equations Mathematical analysis Mathematical software Mathematical software performance Mathematics of computing Multicore architectures Parallel architectures Partial differential equations Solvers
title	Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T00%3A41%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fast%20Matrix-Free%20Evaluation%20of%20Discontinuous%20Galerkin%20Finite%20Element%20Operators&rft.jtitle=ACM%20transactions%20on%20mathematical%20software&rft.au=Kronbichler,%20Martin&rft.date=2019-08-01&rft.volume=45&rft.issue=3&rft.spage=1&rft.epage=40&rft.pages=1-40&rft.artnum=29&rft.issn=0098-3500&rft.eissn=1557-7295&rft_id=info:doi/10.1145/3325864&rft_dat=%3Cacm_cross%3E3325864%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true