A trace-driven emulation framework to predict scalability of large clusters in presence of OS Jitter

Various studies have pointed out the debilitating effects of OS jitter on the performance of parallel applications on large clusters such as the ASCI Purple and the Mare Nostrum at Barcelona Supercomputing Center. These clusters use commodity OSes such as AIX and Linux respectively. The biggest hind...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	De, P., Kothari, R., Mann, V.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Benchmark testing Emulation Jitter Kernel Linux Scalability Synchronization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Various studies have pointed out the debilitating effects of OS jitter on the performance of parallel applications on large clusters such as the ASCI Purple and the Mare Nostrum at Barcelona Supercomputing Center. These clusters use commodity OSes such as AIX and Linux respectively. The biggest hindrance in evaluating any technique to mitigate jitter is getting access to such large scale production HPC systems running a commodity OS. An earlier attempt aimed at solving this problem was to emulate the effects of OS jitter on more widely available and jitter-free systems such as BlueGene/L. In this paper, we point out the shortcomings of previous such approaches and present the design and implementation of an emulation framework that helps overcome those shortcomings by using innovative techniques. We collect jitter traces on a commodity OS with a given configuration, under which we want to study the scaling behavior. These traces are then replayed on a jitter-free system to predict scalability in presence of OS jitter. The application of this emulation framework to predict scalability is illustrated through a comparative scalability study of an off-the-shelf Linux distribution with a minimal configuration (runlevel 1) and a highly optimized embedded Linux distribution, running on the IO nodes of BlueGene/L. We validate the results of our emulation both on a single node as well as on a real cluster. Our results indicate that an optimized OS along with a technique to synchronize jitter can reduce the performance degradation due to jitter from 99% (in case of the off-the-shelf Linux without any synchronization) to a much more tolerable level of 6% (in case of highly optimized BlueGene/L IO node Linux with synchronization) at 2048 processors. Furthermore, perfect synchronization can give linear scaling with less than 1% slowdown, regardless of the type of OS used. However, as the jitter at different nodes starts getting desynchronized, even with a minor skew across nodes, the optimized OS starts outperforming the off-the-shelf OS.
ISSN:	1552-5244 2168-9253
DOI:	10.1109/CLUSTR.2008.4663776