A three-dimensional domain decomposition method for large-scale DFT electronic structure calculations

With tens of petaflops supercomputers already in operation and exaflops machines expected to appear within the next 10 years, efficient parallel computational methods are required to take advantage of such extreme-scale machines. In this paper, we present a three-dimensional domain decomposition sch...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer physics communications 2014-03, Vol.185 (3), p.777-789
Hauptverfasser: Duy, Truong Vinh Truong, Ozaki, Taisuke
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With tens of petaflops supercomputers already in operation and exaflops machines expected to appear within the next 10 years, efficient parallel computational methods are required to take advantage of such extreme-scale machines. In this paper, we present a three-dimensional domain decomposition scheme for enabling large-scale electronic structure calculations based on density functional theory (DFT) on massively parallel computers. It is composed of two methods: (i) the atom decomposition method and (ii) the grid decomposition method. In the former method, we develop a modified recursive bisection method based on the moment of inertia tensor to reorder the atoms along a principal axis so that atoms that are close in real space are also close on the axis to ensure data locality. The atoms are then divided into sub-domains depending on their projections onto the principal axis in a balanced way among the processes. In the latter method, we define four data structures for the partitioning of grid points that are carefully constructed to make data locality consistent with that of the clustered atoms for minimizing data communications between the processes. We also propose a decomposition method for solving the Poisson equation using the three-dimensional FFT in Hartree potential calculation, which is shown to be better in terms of communication efficiency than a previously proposed parallelization method based on a two-dimensional decomposition. For evaluation, we perform benchmark calculations with our open-source DFT code, OpenMX, paying particular attention to the O(N) Krylov subspace method. The results show that our scheme exhibits good strong and weak scaling properties, with the parallel efficiency at 131,072 cores being 67.7% compared to the baseline of 16,384 cores with 131,072 atoms of the diamond structure on the K computer.
ISSN:0010-4655
1879-2944
DOI:10.1016/j.cpc.2013.11.008