K MapReduce: A scalable tool for data-processing and search/ensemble applications on large-scale supercomputers

K MapReduce (KMR) is a high-performance MapReduce system in the MPI environment, targeting large-scale supercomputers such as the K computer. Its objectives are to ease programming for data-processing and to achieve efficiency by utilizing the large amount of memory available in large-scale supercom...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Matsuda, Motohiko, Maruyama, Naoya, Takizawa, Shin'ichiro
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Ions Loading Programming Supercomputers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	K MapReduce (KMR) is a high-performance MapReduce system in the MPI environment, targeting large-scale supercomputers such as the K computer. Its objectives are to ease programming for data-processing and to achieve efficiency by utilizing the large amount of memory available in large-scale supercomputers. In KMR, shuffling operation exchanges key-value pairs in a scalable way by collective communication algorithms utilizing the K's interconnect. Mapping and reducing operations are multi-threaded to achieve even greater efficiency in modern multi-core machines. Sorting is optimized using fixed-length packed keys instead of variable-length raw keys, which is extensively used inside of shuffling and reducing operations. Besides the MapReduce operations, KMR provides routines for collective file reading for affinity-aware optimizations. This paper presents the results of experimental performance studies of KMR on the K computer. Affinity-aware file loading improves the performance by about 42% over a non-optimized implementation. We also show how KMR can be used to program real-world scientific applications such as meta-genome search and replica-exchange molecular dynamics.
ISSN:	1552-5244 2168-9253
DOI:	10.1109/CLUSTER.2013.6702663