Metabolic Flux Analysis in the Cloud

The MapReduce pattern popularized by Google has successfully been utilized in several scientific applications. In this paper, it is investigated whether a MapReduce approach utilizing on-demand resources from a Cloud is beneficial to perform simulation tasks in the area of Systems Biology and whethe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Dalman, Tolga, Doernemann, Tim, Juhnke, Ernst, Weitzel, Michael, Smith, Matthew, Wiechert, Wolfgang, Noh, Katharina, Freisleben, Bernd
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The MapReduce pattern popularized by Google has successfully been utilized in several scientific applications. In this paper, it is investigated whether a MapReduce approach utilizing on-demand resources from a Cloud is beneficial to perform simulation tasks in the area of Systems Biology and whether it can be seamlessly integrated into a service-oriented scientific workflow framework. In particular, an Amazon Elastic Map Reduce Cloud implementation of the 13C-MFA (Metabolix Flux Analysis) Monte Carlo bootstrap approach aimed at the integration into an existing BPEL-based scientific workflow system is presented. A comparison of a 64 node MapReduce cluster with a single node computation approach reveals a total performance gain up to a factor of 14, with a total cost for on-demand resources of 11. The most critical factor in terms of performance is I/O, i.e. our application suffers from the fact that I/O operations on many small files are expensive using Amazon S3 and the Hadoop DFS.
DOI:10.1109/eScience.2010.20