Provably optimal parallel transport sweeps on semi-structured grids
•First theoretical and numerical demonstration of parallel transport-sweep algorithm that executes sweeps in minimum possible number of stages.•“Optimal sweep algorithm” allows given code to choose parameters that minimize execution time for given problem on given machine.•Excellent parallel scaling...
Gespeichert in:
Veröffentlicht in: | Journal of computational physics 2020-04, Vol.407 (C), p.109234, Article 109234 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •First theoretical and numerical demonstration of parallel transport-sweep algorithm that executes sweeps in minimum possible number of stages.•“Optimal sweep algorithm” allows given code to choose parameters that minimize execution time for given problem on given machine.•Excellent parallel scaling to > 1.M processes with simple grids and with polyhedral spatial grids that resolve fine details in a nuclear reactor.
We have found provably optimal algorithms for full-domain discrete-ordinate transport sweeps on a class of grids in 2D and 3D Cartesian geometry that are regular at a coarse level but arbitrary within the coarse blocks. We describe these algorithms and show that they always execute the full eight-octant (or four-quadrant if 2D) sweep in the minimum possible number of stages for a given Px×Py×Pz partitioning. Computational results confirm that our optimal scheduling algorithms execute sweeps in the minimum possible stage count. Observed parallel efficiencies agree well with our performance model. Our PDT transport code has achieved approximately 68% parallel efficiency with >1.5M parallel threads, relative to 8 threads, on a simple weak-scaling problem with only three energy groups, 10 directions per octant, and 4096 cells/thread. Our ARDRA code has achieved 71% efficiency with >1.5M cores, relative to 16 cores, with 36 directions per octant and 48 energy groups. We demonstrate similar efficiencies with PDT on a realistic set of nuclear-reactor test problems, with unstructured meshes that resolve fine geometric details. These results demonstrate that discrete-ordinates transport sweeps can be executed with high efficiency using more than 106 parallel processes. |
---|---|
ISSN: | 0021-9991 1090-2716 |
DOI: | 10.1016/j.jcp.2020.109234 |