Energy efficiency of finite difference algorithms on multicore CPUs, GPUs, and Intel Xeon Phi processors
In addition to hardware wall-time restrictions commonly seen in high-performance computing systems, it is likely that future systems will also be constrained by energy budgets. In the present work, finite difference algorithms of varying computational and memory intensity are evaluated with respect...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In addition to hardware wall-time restrictions commonly seen in
high-performance computing systems, it is likely that future systems will also
be constrained by energy budgets. In the present work, finite difference
algorithms of varying computational and memory intensity are evaluated with
respect to both energy efficiency and runtime on an Intel Ivy Bridge CPU node,
an Intel Xeon Phi Knights Landing processor, and an NVIDIA Tesla K40c GPU. The
conventional way of storing the discretised derivatives to global arrays for
solution advancement is found to be inefficient in terms of energy consumption
and runtime. In contrast, a class of algorithms in which the discretised
derivatives are evaluated on-the-fly or stored as thread-/process-local
variables (yielding high compute intensity) is optimal both with respect to
energy consumption and runtime. On all three hardware architectures considered,
a speed-up of ~2 and an energy saving of ~2 are observed for the high compute
intensive algorithms compared to the memory intensive algorithm. The energy
consumption is found to be proportional to runtime, irrespective of the power
consumed and the GPU has an energy saving of ~5 compared to the same algorithm
on a CPU node. |
---|---|
DOI: | 10.48550/arxiv.1709.09713 |