GPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions

We present an implementation of all-electron density-functional theory for massively parallel GPU-based platforms, using localized atom-centered basis functions and real-space integration grids. Special attention is paid to domain decomposition of the problem on non-uniform grids, which enables comp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer physics communications 2020-09, Vol.254 (C), p.107314, Article 107314
Hauptverfasser: Huhn, William P., Lange, Björn, Yu, Victor Wen-zhe, Yoon, Mina, Blum, Volker
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We present an implementation of all-electron density-functional theory for massively parallel GPU-based platforms, using localized atom-centered basis functions and real-space integration grids. Special attention is paid to domain decomposition of the problem on non-uniform grids, which enables compute- and memory-parallel execution across thousands of nodes for real-space operations, e.g. the update of the electron density, the integration of the real-space Hamiltonian matrix, and calculation of Pulay forces. To assess the performance of our GPU implementation, we performed benchmarks on three different architectures using a 103-material test set. We find that operations which rely on dense serial linear algebra show dramatic speedups from GPU acceleration: in particular, SCF iterations including force and stress calculations exhibit speedups ranging from 4.5 to 6.6. For the architectures and problem types investigated here, this translates to an expected overall speedup between 3–4 for the entire calculation (including non-GPU accelerated parts), for problems featuring several tens to hundreds of atoms. Additional calculations for a 375-atom Bi2Se3 bilayer show that the present GPU strategy scales for large-scale distributed-parallel simulations. •Real-space electronic structure theory is naturally suited to GPU acceleration.•Minimal code rewrite was necessary to port domain decomposition algorithms to GPUs.•GPU speedups of 3-4X in overall time-to-solution are observed for the FHI-aims code.•GPU speedups over 10X are observed for algorithms dominated by dense linear algebra.
ISSN:0010-4655
1879-2944
DOI:10.1016/j.cpc.2020.107314