AutOMP: An Automatic OpenMP Parallelization Generator for Variable-Oriented High-Performance Scientific Codes
OpenMP is a cross-platform API that extends C, C++ and Fortran and provides shared-memory parallelism platform for those languages. The use of many cores and HPC technologies for scientific computing has been spread since the 1990s, and now takes part in many fields of research. The relative ease of...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | OpenMP is a cross-platform API that extends C, C++ and Fortran and provides
shared-memory parallelism platform for those languages. The use of many cores
and HPC technologies for scientific computing has been spread since the 1990s,
and now takes part in many fields of research. The relative ease of
implementing OpenMP, along with the development of multi-core shared memory
processors (such as Intel Xeon Phi) makes OpenMP a favorable method for
parallelization in the process of modernizing a legacy codes. Legacy scientific
codes are usually holding large number of physical arrays which being used and
updated by the code routines. In most of the cases the parallelization of such
code focuses on loop parallelization. A key step in this parallelization is
deciding which of the variables in the parallelized scope should be private (so
each thread will hold a copy of them), and which variables should be shared
across the threads. Other important step is finding which variables should be
synchronized after the loop execution. In this work we present an automatic
pre-processor that preforms these stages - AutOMP (Automatic OpenMP). AutOMP
recognize all the variables assignments inside a loop. These variables will be
private unless the assignment is of an array element which depend on the loop
index variable. Afterwards, AutOMP finds the places where threads
synchronization is needed, and which reduction operator is to be used. At last,
the program provides the parallelization command to be used for parallelizing
the loop. |
---|---|
DOI: | 10.48550/arxiv.1707.07137 |