Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax–Wendroff correction stencil

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, which greatly limits their performance and pow...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The international journal of high performance computing applications 2014-08, Vol.28 (3), p.301-318
Hauptverfasser:	You, Yang, Fu, Haohuan, Song, Shuaiwen Leon, Dehnavi, Maryam Mehri, Gan, Lin, Huang, Xiaomeng, Yang, Guangwen
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerators Architecture Bridges (structures) Central processing units Computation Computer architecture Mathematical models Oil exploration Optimization Optimization techniques Propagation Strategy Studies Wave propagation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, which greatly limits their performance and power efficiency. In this paper, we accelerate the forward-modeling technique on the latest multi-core and many-core architectures such as Intel® Sandy Bridge CPUs, NVIDIA Fermi C2070 GPUs, NVIDIA Kepler K20× GPUs, and the Intel® Xeon Phi co-processor. For the GPU platforms, we propose two parallel strategies to explore the performance optimization opportunities for our stencil kernels. For Sandy Bridge CPUs and MIC, we also employ various optimization techniques in order to achieve the best performance. Although our stencil with 114 component variables poses several great challenges for performance optimization, and the low stencil ratio between computation and memory access is too inefficient to fully take advantage of our evaluated architectures, we manage to achieve performance efficiencies ranging from 4.730% to 20.02% of the theoretical peak. We also conduct cross-platform performance and power analysis (focusing on Kepler GPU and MIC) and the results could serve as insights for users selecting the most suitable accelerators for their targeted applications.
ISSN:	1094-3420 1741-2846
DOI:	10.1177/1094342014524807