Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning

In the last few years, the emergence of multicore architectures has revolutionized the landscape of high-performance computing. The multicore shift has not only increased the per-node performance potential of computer systems but also has made great strides in curbing power and heat dissipation. As...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Qasem, A., Cade, M. J., Tamir, D.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the last few years, the emergence of multicore architectures has revolutionized the landscape of high-performance computing. The multicore shift has not only increased the per-node performance potential of computer systems but also has made great strides in curbing power and heat dissipation. As we look to the future, however, the gains in performance and energy consumption is not going to come from hardware alone. Software needs to play a key role in achieving a high fraction of peak and keeping the energy consumption within the desired envelope. To attain this goal, performance-enhancing and energy-conserving software needs to carefully orchestrate many architecture-sensitive parameters. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to both parallelism and data locality to achieve the desired power-performance ratio. This paper studies the complex interaction among several code transformations that affect data locality, problem decomposition and selection of loops for parallelism. We characterize this interaction using static compiler analysis and generate a pruned search space suitable for efficient autotuning. We also extend a heuristic based on number of threads, data reuse patterns, and the size and configuration of the shared cache, to estimate good synchronization interval for conserving energy in parallel code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific and engineering kernels on four different multicore platforms. Results of the experimental study reveal several interesting properties of the transformation search space and demonstrate the effectiveness of the heuristic in predicting good synchronization intervals that reduce energy consumption without a significant degradation in performance.
ISSN:2166-546X
2166-5478
DOI:10.1109/GREEN.2012.6200963