The Tiny-Tasks Granularity Trade-Off Balancing Overhead vs. Performance in Parallel Systems

Models of parallel processing systems typically assume that one has l workers and jobs are split into an equal number of k=l tasks. Splitting jobs into k \gt l smaller tasks, i.e. using "tiny tasks", can yield performance and stability improvements because it reduces the variance in the am...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2023-01, p.1-17
Hauptverfasser: Bora, Stefan, Walker, Brenton, Fidler, Markus
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Models of parallel processing systems typically assume that one has l workers and jobs are split into an equal number of k=l tasks. Splitting jobs into k \gt l smaller tasks, i.e. using "tiny tasks", can yield performance and stability improvements because it reduces the variance in the amount of work assigned to each worker, but as k increases, the overhead involved in scheduling and managing the tasks begins to overtake the performance benefit. We perform extensive experiments on the effects of task granularity on an Apache Spark cluster, and based on these, develop a four-parameter model for task and job overhead that, in simulation, produces sojourn time distributions that match those of the real system. We also present analytical results which illustrate how using tiny tasks improves the stability region of split-merge systems, and analytical bounds on the sojourn and waiting time distributions of both split-merge and single-queue fork-join systems with tiny tasks. Finally we combine the overhead model with the analytical models to produce an analytical approximation to the sojourn and waiting time distributions of systems with tiny tasks which include overhead. We also perform analogous tiny-tasks experiments on a hybrid multi-processor shared memory system based on MPI and OpenMP which has no load-balancing between nodes. Though no longer strict analytical bounds, our analytical approximations with overhead match both the Spark and MPI/OpenMP experimental results very well.
ISSN:1045-9219
DOI:10.1109/TPDS.2022.3233712