Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators
We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Differen...
Gespeichert in:
Veröffentlicht in: | ACM transactions on mathematical software 2019-08, Vol.45 (3), p.1-40, Article 29 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 40 |
---|---|
container_issue | 3 |
container_start_page | 1 |
container_title | ACM transactions on mathematical software |
container_volume | 45 |
creator | Kronbichler, Martin Kormann, Katharina |
description | We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library. |
doi_str_mv | 10.1145/3325864 |
format | Article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3325864</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3325864</sourcerecordid><originalsourceid>FETCH-LOGICAL-a244t-e8bf27bc7466b719f8e2c087c19bb15130a401464821a4e3270fcbb24d69279e3</originalsourceid><addsrcrecordid>eNo90D1PwzAYBGALgUQoiJ3JG1Pgff0RJyMqTUEqdIE5so0tGdKksh0E_56iFqYb7tENR8glwg2ikLecM1lX4ogUKKUqFWvkMSkAmrrkEuCUnKX0DgAMFRbkudUp0yedY_gq2-gcXXzqftI5jAMdPb0PyY5DDsM0Tokude_iRxhoG4aQd7Z3Gzdkut66qPMY0zk58bpP7uKQM_LaLl7mD-VqvXyc361KzYTIpauNZ8pYJarKKGx87ZiFWllsjEGJHLQAFJWoGWrhOFPgrTFMvFUNU43jM3K937VxTCk6321j2Oj43SF0vzd0hxt28movtd38o7_yByiDVuI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators</title><source>ACM Digital Library Complete</source><creator>Kronbichler, Martin ; Kormann, Katharina</creator><creatorcontrib>Kronbichler, Martin ; Kormann, Katharina</creatorcontrib><description>We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.</description><identifier>ISSN: 0098-3500</identifier><identifier>EISSN: 1557-7295</identifier><identifier>DOI: 10.1145/3325864</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Architectures ; Computer systems organization ; Differential equations ; Mathematical analysis ; Mathematical software ; Mathematical software performance ; Mathematics of computing ; Multicore architectures ; Parallel architectures ; Partial differential equations ; Solvers</subject><ispartof>ACM transactions on mathematical software, 2019-08, Vol.45 (3), p.1-40, Article 29</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a244t-e8bf27bc7466b719f8e2c087c19bb15130a401464821a4e3270fcbb24d69279e3</citedby><cites>FETCH-LOGICAL-a244t-e8bf27bc7466b719f8e2c087c19bb15130a401464821a4e3270fcbb24d69279e3</cites><orcidid>0000-0001-8406-835X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3325864$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2282,27924,27925,40196,76228</link.rule.ids></links><search><creatorcontrib>Kronbichler, Martin</creatorcontrib><creatorcontrib>Kormann, Katharina</creatorcontrib><title>Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators</title><title>ACM transactions on mathematical software</title><addtitle>ACM TOMS</addtitle><description>We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.</description><subject>Architectures</subject><subject>Computer systems organization</subject><subject>Differential equations</subject><subject>Mathematical analysis</subject><subject>Mathematical software</subject><subject>Mathematical software performance</subject><subject>Mathematics of computing</subject><subject>Multicore architectures</subject><subject>Parallel architectures</subject><subject>Partial differential equations</subject><subject>Solvers</subject><issn>0098-3500</issn><issn>1557-7295</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNo90D1PwzAYBGALgUQoiJ3JG1Pgff0RJyMqTUEqdIE5so0tGdKksh0E_56iFqYb7tENR8glwg2ikLecM1lX4ogUKKUqFWvkMSkAmrrkEuCUnKX0DgAMFRbkudUp0yedY_gq2-gcXXzqftI5jAMdPb0PyY5DDsM0Tokude_iRxhoG4aQd7Z3Gzdkut66qPMY0zk58bpP7uKQM_LaLl7mD-VqvXyc361KzYTIpauNZ8pYJarKKGx87ZiFWllsjEGJHLQAFJWoGWrhOFPgrTFMvFUNU43jM3K937VxTCk6321j2Oj43SF0vzd0hxt28movtd38o7_yByiDVuI</recordid><startdate>20190801</startdate><enddate>20190801</enddate><creator>Kronbichler, Martin</creator><creator>Kormann, Katharina</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-8406-835X</orcidid></search><sort><creationdate>20190801</creationdate><title>Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators</title><author>Kronbichler, Martin ; Kormann, Katharina</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a244t-e8bf27bc7466b719f8e2c087c19bb15130a401464821a4e3270fcbb24d69279e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Architectures</topic><topic>Computer systems organization</topic><topic>Differential equations</topic><topic>Mathematical analysis</topic><topic>Mathematical software</topic><topic>Mathematical software performance</topic><topic>Mathematics of computing</topic><topic>Multicore architectures</topic><topic>Parallel architectures</topic><topic>Partial differential equations</topic><topic>Solvers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kronbichler, Martin</creatorcontrib><creatorcontrib>Kormann, Katharina</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on mathematical software</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kronbichler, Martin</au><au>Kormann, Katharina</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators</atitle><jtitle>ACM transactions on mathematical software</jtitle><stitle>ACM TOMS</stitle><date>2019-08-01</date><risdate>2019</risdate><volume>45</volume><issue>3</issue><spage>1</spage><epage>40</epage><pages>1-40</pages><artnum>29</artnum><issn>0098-3500</issn><eissn>1557-7295</eissn><abstract>We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3325864</doi><tpages>40</tpages><orcidid>https://orcid.org/0000-0001-8406-835X</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0098-3500 |
ispartof | ACM transactions on mathematical software, 2019-08, Vol.45 (3), p.1-40, Article 29 |
issn | 0098-3500 1557-7295 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3325864 |
source | ACM Digital Library Complete |
subjects | Architectures Computer systems organization Differential equations Mathematical analysis Mathematical software Mathematical software performance Mathematics of computing Multicore architectures Parallel architectures Partial differential equations Solvers |
title | Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T00%3A41%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fast%20Matrix-Free%20Evaluation%20of%20Discontinuous%20Galerkin%20Finite%20Element%20Operators&rft.jtitle=ACM%20transactions%20on%20mathematical%20software&rft.au=Kronbichler,%20Martin&rft.date=2019-08-01&rft.volume=45&rft.issue=3&rft.spage=1&rft.epage=40&rft.pages=1-40&rft.artnum=29&rft.issn=0098-3500&rft.eissn=1557-7295&rft_id=info:doi/10.1145/3325864&rft_dat=%3Cacm_cross%3E3325864%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |