A new thread-level speculative automatic parallelization model and library based on duplicate code execution

Loop-efficient automatic parallelization has become increasingly relevant due to the growing number of cores in current processors and the programming effort needed to parallelize codes in these systems efficiently. However, automatic tools fail to extract all the available parallelism in irregular...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2024, Vol.80 (10), p.13714-13737
Hauptverfasser:	Martínez, Millán A., Fraguela, Basilio B., Cabaleiro, José C., Rivera, Francisco F.
Format:	Artikel
Sprache:	eng
Schlagworte:	Compilers Computer Science Interpreters Libraries Processor Architectures Programming Languages
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Loop-efficient automatic parallelization has become increasingly relevant due to the growing number of cores in current processors and the programming effort needed to parallelize codes in these systems efficiently. However, automatic tools fail to extract all the available parallelism in irregular loops with indirections, race conditions or potential data dependency violations, among many other possible causes. One of the successful ways to automatically parallelize these loops is the use of speculative parallelization techniques. This paper presents a new model and the corresponding C++ library that supports the speculative automatic parallelization of loops in shared memory systems, seeking competitive performance and scalability while keeping user effort to a minimum. The primary speculative strategy consists of redundantly executing chunks of loop iterations in a duplicate fashion. Namely, each chunk is executed speculatively in parallel to obtain results as soon as possible and sequentially in a different thread to validate the speculative results. The implementation uses C++11 threads and it makes intensive use of templates and advanced multithreading techniques. An evaluation based on various benchmarks confirms that our proposal provides a competitive level of performance and scalability.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-024-05987-0