Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators

We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Differen...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on mathematical software 2019-08, Vol.45 (3), p.1-40, Article 29
Hauptverfasser: Kronbichler, Martin, Kormann, Katharina
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 40
container_issue 3
container_start_page 1
container_title ACM transactions on mathematical software
container_volume 45
creator Kronbichler, Martin
Kormann, Katharina
description We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.
doi_str_mv 10.1145/3325864
format Article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3325864</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3325864</sourcerecordid><originalsourceid>FETCH-LOGICAL-a244t-e8bf27bc7466b719f8e2c087c19bb15130a401464821a4e3270fcbb24d69279e3</originalsourceid><addsrcrecordid>eNo90D1PwzAYBGALgUQoiJ3JG1Pgff0RJyMqTUEqdIE5so0tGdKksh0E_56iFqYb7tENR8glwg2ikLecM1lX4ogUKKUqFWvkMSkAmrrkEuCUnKX0DgAMFRbkudUp0yedY_gq2-gcXXzqftI5jAMdPb0PyY5DDsM0Tokude_iRxhoG4aQd7Z3Gzdkut66qPMY0zk58bpP7uKQM_LaLl7mD-VqvXyc361KzYTIpauNZ8pYJarKKGx87ZiFWllsjEGJHLQAFJWoGWrhOFPgrTFMvFUNU43jM3K937VxTCk6321j2Oj43SF0vzd0hxt28movtd38o7_yByiDVuI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators</title><source>ACM Digital Library Complete</source><creator>Kronbichler, Martin ; Kormann, Katharina</creator><creatorcontrib>Kronbichler, Martin ; Kormann, Katharina</creatorcontrib><description>We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.</description><identifier>ISSN: 0098-3500</identifier><identifier>EISSN: 1557-7295</identifier><identifier>DOI: 10.1145/3325864</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Architectures ; Computer systems organization ; Differential equations ; Mathematical analysis ; Mathematical software ; Mathematical software performance ; Mathematics of computing ; Multicore architectures ; Parallel architectures ; Partial differential equations ; Solvers</subject><ispartof>ACM transactions on mathematical software, 2019-08, Vol.45 (3), p.1-40, Article 29</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a244t-e8bf27bc7466b719f8e2c087c19bb15130a401464821a4e3270fcbb24d69279e3</citedby><cites>FETCH-LOGICAL-a244t-e8bf27bc7466b719f8e2c087c19bb15130a401464821a4e3270fcbb24d69279e3</cites><orcidid>0000-0001-8406-835X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3325864$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2282,27924,27925,40196,76228</link.rule.ids></links><search><creatorcontrib>Kronbichler, Martin</creatorcontrib><creatorcontrib>Kormann, Katharina</creatorcontrib><title>Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators</title><title>ACM transactions on mathematical software</title><addtitle>ACM TOMS</addtitle><description>We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.</description><subject>Architectures</subject><subject>Computer systems organization</subject><subject>Differential equations</subject><subject>Mathematical analysis</subject><subject>Mathematical software</subject><subject>Mathematical software performance</subject><subject>Mathematics of computing</subject><subject>Multicore architectures</subject><subject>Parallel architectures</subject><subject>Partial differential equations</subject><subject>Solvers</subject><issn>0098-3500</issn><issn>1557-7295</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNo90D1PwzAYBGALgUQoiJ3JG1Pgff0RJyMqTUEqdIE5so0tGdKksh0E_56iFqYb7tENR8glwg2ikLecM1lX4ogUKKUqFWvkMSkAmrrkEuCUnKX0DgAMFRbkudUp0yedY_gq2-gcXXzqftI5jAMdPb0PyY5DDsM0Tokude_iRxhoG4aQd7Z3Gzdkut66qPMY0zk58bpP7uKQM_LaLl7mD-VqvXyc361KzYTIpauNZ8pYJarKKGx87ZiFWllsjEGJHLQAFJWoGWrhOFPgrTFMvFUNU43jM3K937VxTCk6321j2Oj43SF0vzd0hxt28movtd38o7_yByiDVuI</recordid><startdate>20190801</startdate><enddate>20190801</enddate><creator>Kronbichler, Martin</creator><creator>Kormann, Katharina</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-8406-835X</orcidid></search><sort><creationdate>20190801</creationdate><title>Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators</title><author>Kronbichler, Martin ; Kormann, Katharina</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a244t-e8bf27bc7466b719f8e2c087c19bb15130a401464821a4e3270fcbb24d69279e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Architectures</topic><topic>Computer systems organization</topic><topic>Differential equations</topic><topic>Mathematical analysis</topic><topic>Mathematical software</topic><topic>Mathematical software performance</topic><topic>Mathematics of computing</topic><topic>Multicore architectures</topic><topic>Parallel architectures</topic><topic>Partial differential equations</topic><topic>Solvers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kronbichler, Martin</creatorcontrib><creatorcontrib>Kormann, Katharina</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on mathematical software</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kronbichler, Martin</au><au>Kormann, Katharina</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators</atitle><jtitle>ACM transactions on mathematical software</jtitle><stitle>ACM TOMS</stitle><date>2019-08-01</date><risdate>2019</risdate><volume>45</volume><issue>3</issue><spage>1</spage><epage>40</epage><pages>1-40</pages><artnum>29</artnum><issn>0098-3500</issn><eissn>1557-7295</eissn><abstract>We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3325864</doi><tpages>40</tpages><orcidid>https://orcid.org/0000-0001-8406-835X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0098-3500
ispartof ACM transactions on mathematical software, 2019-08, Vol.45 (3), p.1-40, Article 29
issn 0098-3500
1557-7295
language eng
recordid cdi_crossref_primary_10_1145_3325864
source ACM Digital Library Complete
subjects Architectures
Computer systems organization
Differential equations
Mathematical analysis
Mathematical software
Mathematical software performance
Mathematics of computing
Multicore architectures
Parallel architectures
Partial differential equations
Solvers
title Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T00%3A41%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fast%20Matrix-Free%20Evaluation%20of%20Discontinuous%20Galerkin%20Finite%20Element%20Operators&rft.jtitle=ACM%20transactions%20on%20mathematical%20software&rft.au=Kronbichler,%20Martin&rft.date=2019-08-01&rft.volume=45&rft.issue=3&rft.spage=1&rft.epage=40&rft.pages=1-40&rft.artnum=29&rft.issn=0098-3500&rft.eissn=1557-7295&rft_id=info:doi/10.1145/3325864&rft_dat=%3Cacm_cross%3E3325864%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true