Characterizing the performance benefit of hybrid memory system for HPC applications
•We perform memory-centric analysis of tested applications.•We quantify the performance impact of using different memory configurations (HBM-only, cache and DRAM-only), problem size, and number of threads on representative HPC applications.•We identify three factors that impact performance benefits...
Gespeichert in:
Veröffentlicht in: | Parallel computing 2018-08, Vol.76 (C), p.57-69 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •We perform memory-centric analysis of tested applications.•We quantify the performance impact of using different memory configurations (HBM-only, cache and DRAM-only), problem size, and number of threads on representative HPC applications.•We identify three factors that impact performance benefits from the HBM-DRAM memory system of the KNL processors.•We show that applications with regular memory access patterns largely benefit the HBM. On the contrary, applications with irregular access patterns may have performance degradation from the high latency in HBM.
Heterogenous memory systems that consist of multiple memory technologies are becoming common in high-performance computing environments. Modern processors and accelerators, such as the Intel Knights Landing (KNL) CPU and NVIDIA Volta GPU, feature small-size high-bandwidth memory near the compute cores and large-size normal-bandwidth memory that is connected off-chip. Theoretically, HBM can provide about four times higher bandwidth than conventional DRAM. However, many factors impact the actual performance improvement that an application can achieve on such system. In this paper, we focus on the Intel KNL system and identify the most important factors on the application performance, including the application memory access pattern, the problem size, the threading level and the actual memory configuration. We use a set of representative applications from both scientific and data-analytics domains. Our results show that applications with regular memory access benefit from MCDRAM, achieving up to three times performance when compared to the performance obtained using only DRAM. On the contrary, applications with irregular memory access pattern are latency-bound and may suffer from performance degradation when using only MCDRAM. Also, we provide memory-centric analysis of four applications, identify their major data objects, correlate their characteristics to the performance improvement on the testbed. |
---|---|
ISSN: | 0167-8191 1872-7336 1872-7336 |
DOI: | 10.1016/j.parco.2018.04.007 |