Metabolic Flux Analysis in the Cloud

The MapReduce pattern popularized by Google has successfully been utilized in several scientific applications. In this paper, it is investigated whether a MapReduce approach utilizing on-demand resources from a Cloud is beneficial to perform simulation tasks in the area of Systems Biology and whethe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Dalman, Tolga, Doernemann, Tim, Juhnke, Ernst, Weitzel, Michael, Smith, Matthew, Wiechert, Wolfgang, Noh, Katharina, Freisleben, Bernd
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 64
container_issue
container_start_page 57
container_title
container_volume
creator Dalman, Tolga
Doernemann, Tim
Juhnke, Ernst
Weitzel, Michael
Smith, Matthew
Wiechert, Wolfgang
Noh, Katharina
Freisleben, Bernd
description The MapReduce pattern popularized by Google has successfully been utilized in several scientific applications. In this paper, it is investigated whether a MapReduce approach utilizing on-demand resources from a Cloud is beneficial to perform simulation tasks in the area of Systems Biology and whether it can be seamlessly integrated into a service-oriented scientific workflow framework. In particular, an Amazon Elastic Map Reduce Cloud implementation of the 13C-MFA (Metabolix Flux Analysis) Monte Carlo bootstrap approach aimed at the integration into an existing BPEL-based scientific workflow system is presented. A comparison of a 64 node MapReduce cluster with a single node computation approach reveals a total performance gain up to a factor of 14, with a total cost for on-demand resources of 11. The most critical factor in terms of performance is I/O, i.e. our application suffers from the fact that I/O operations on many small files are expensive using Amazon S3 and the Hadoop DFS.
doi_str_mv 10.1109/eScience.2010.20
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5693899</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5693899</ieee_id><sourcerecordid>5693899</sourcerecordid><originalsourceid>FETCH-LOGICAL-c137t-e5c8f0d18af99d862a8678dff71d77b44e0a2bd3652163bb9e47075ae1f37ae83</originalsourceid><addsrcrecordid>eNotjD1LA0EUAFdEUOP1gs0Wthf37dfbV4bDqJCQQq3D3u1bXDkvkruA-fcGdJqBKUaIW1BzAEUP_NoVHjqea3VKWp2JijAo9OSsJmXPxTVYbW0gh3ApqnH8VCecRgS8EvdrnmK760snl_3hRy6G2B_HMsoyyOmDZdPvDulGXOTYj1z9eybel49vzXO92jy9NItV3YHBqWbXhawShJiJUvA6Bo8h5YyQEFtrWUXdJuOdBm_altiiQhcZssHIwczE3d-3MPP2e1--4v64dZ5MIDK_RKBAkw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Metabolic Flux Analysis in the Cloud</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Dalman, Tolga ; Doernemann, Tim ; Juhnke, Ernst ; Weitzel, Michael ; Smith, Matthew ; Wiechert, Wolfgang ; Noh, Katharina ; Freisleben, Bernd</creator><creatorcontrib>Dalman, Tolga ; Doernemann, Tim ; Juhnke, Ernst ; Weitzel, Michael ; Smith, Matthew ; Wiechert, Wolfgang ; Noh, Katharina ; Freisleben, Bernd</creatorcontrib><description>The MapReduce pattern popularized by Google has successfully been utilized in several scientific applications. In this paper, it is investigated whether a MapReduce approach utilizing on-demand resources from a Cloud is beneficial to perform simulation tasks in the area of Systems Biology and whether it can be seamlessly integrated into a service-oriented scientific workflow framework. In particular, an Amazon Elastic Map Reduce Cloud implementation of the 13C-MFA (Metabolix Flux Analysis) Monte Carlo bootstrap approach aimed at the integration into an existing BPEL-based scientific workflow system is presented. A comparison of a 64 node MapReduce cluster with a single node computation approach reveals a total performance gain up to a factor of 14, with a total cost for on-demand resources of 11. The most critical factor in terms of performance is I/O, i.e. our application suffers from the fact that I/O operations on many small files are expensive using Amazon S3 and the Hadoop DFS.</description><identifier>ISBN: 1424489571</identifier><identifier>ISBN: 9781424489572</identifier><identifier>EISBN: 9780769542904</identifier><identifier>EISBN: 0769542905</identifier><identifier>DOI: 10.1109/eScience.2010.20</identifier><language>eng</language><publisher>IEEE</publisher><subject>Algorithm design and analysis ; Analytical models ; Cloud computing ; Computational modeling ; Data models ; Hadoop ; MapReduce ; MFA ; Monte Carlo methods ; Systems Biology</subject><ispartof>2010 IEEE Sixth International Conference on e-Science, 2010, p.57-64</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c137t-e5c8f0d18af99d862a8678dff71d77b44e0a2bd3652163bb9e47075ae1f37ae83</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5693899$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2051,27904,54898</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5693899$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dalman, Tolga</creatorcontrib><creatorcontrib>Doernemann, Tim</creatorcontrib><creatorcontrib>Juhnke, Ernst</creatorcontrib><creatorcontrib>Weitzel, Michael</creatorcontrib><creatorcontrib>Smith, Matthew</creatorcontrib><creatorcontrib>Wiechert, Wolfgang</creatorcontrib><creatorcontrib>Noh, Katharina</creatorcontrib><creatorcontrib>Freisleben, Bernd</creatorcontrib><title>Metabolic Flux Analysis in the Cloud</title><title>2010 IEEE Sixth International Conference on e-Science</title><addtitle>escience</addtitle><description>The MapReduce pattern popularized by Google has successfully been utilized in several scientific applications. In this paper, it is investigated whether a MapReduce approach utilizing on-demand resources from a Cloud is beneficial to perform simulation tasks in the area of Systems Biology and whether it can be seamlessly integrated into a service-oriented scientific workflow framework. In particular, an Amazon Elastic Map Reduce Cloud implementation of the 13C-MFA (Metabolix Flux Analysis) Monte Carlo bootstrap approach aimed at the integration into an existing BPEL-based scientific workflow system is presented. A comparison of a 64 node MapReduce cluster with a single node computation approach reveals a total performance gain up to a factor of 14, with a total cost for on-demand resources of 11. The most critical factor in terms of performance is I/O, i.e. our application suffers from the fact that I/O operations on many small files are expensive using Amazon S3 and the Hadoop DFS.</description><subject>Algorithm design and analysis</subject><subject>Analytical models</subject><subject>Cloud computing</subject><subject>Computational modeling</subject><subject>Data models</subject><subject>Hadoop</subject><subject>MapReduce</subject><subject>MFA</subject><subject>Monte Carlo methods</subject><subject>Systems Biology</subject><isbn>1424489571</isbn><isbn>9781424489572</isbn><isbn>9780769542904</isbn><isbn>0769542905</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjD1LA0EUAFdEUOP1gs0Wthf37dfbV4bDqJCQQq3D3u1bXDkvkruA-fcGdJqBKUaIW1BzAEUP_NoVHjqea3VKWp2JijAo9OSsJmXPxTVYbW0gh3ApqnH8VCecRgS8EvdrnmK760snl_3hRy6G2B_HMsoyyOmDZdPvDulGXOTYj1z9eybel49vzXO92jy9NItV3YHBqWbXhawShJiJUvA6Bo8h5YyQEFtrWUXdJuOdBm_altiiQhcZssHIwczE3d-3MPP2e1--4v64dZ5MIDK_RKBAkw</recordid><startdate>201012</startdate><enddate>201012</enddate><creator>Dalman, Tolga</creator><creator>Doernemann, Tim</creator><creator>Juhnke, Ernst</creator><creator>Weitzel, Michael</creator><creator>Smith, Matthew</creator><creator>Wiechert, Wolfgang</creator><creator>Noh, Katharina</creator><creator>Freisleben, Bernd</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201012</creationdate><title>Metabolic Flux Analysis in the Cloud</title><author>Dalman, Tolga ; Doernemann, Tim ; Juhnke, Ernst ; Weitzel, Michael ; Smith, Matthew ; Wiechert, Wolfgang ; Noh, Katharina ; Freisleben, Bernd</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c137t-e5c8f0d18af99d862a8678dff71d77b44e0a2bd3652163bb9e47075ae1f37ae83</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Algorithm design and analysis</topic><topic>Analytical models</topic><topic>Cloud computing</topic><topic>Computational modeling</topic><topic>Data models</topic><topic>Hadoop</topic><topic>MapReduce</topic><topic>MFA</topic><topic>Monte Carlo methods</topic><topic>Systems Biology</topic><toplevel>online_resources</toplevel><creatorcontrib>Dalman, Tolga</creatorcontrib><creatorcontrib>Doernemann, Tim</creatorcontrib><creatorcontrib>Juhnke, Ernst</creatorcontrib><creatorcontrib>Weitzel, Michael</creatorcontrib><creatorcontrib>Smith, Matthew</creatorcontrib><creatorcontrib>Wiechert, Wolfgang</creatorcontrib><creatorcontrib>Noh, Katharina</creatorcontrib><creatorcontrib>Freisleben, Bernd</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dalman, Tolga</au><au>Doernemann, Tim</au><au>Juhnke, Ernst</au><au>Weitzel, Michael</au><au>Smith, Matthew</au><au>Wiechert, Wolfgang</au><au>Noh, Katharina</au><au>Freisleben, Bernd</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Metabolic Flux Analysis in the Cloud</atitle><btitle>2010 IEEE Sixth International Conference on e-Science</btitle><stitle>escience</stitle><date>2010-12</date><risdate>2010</risdate><spage>57</spage><epage>64</epage><pages>57-64</pages><isbn>1424489571</isbn><isbn>9781424489572</isbn><eisbn>9780769542904</eisbn><eisbn>0769542905</eisbn><abstract>The MapReduce pattern popularized by Google has successfully been utilized in several scientific applications. In this paper, it is investigated whether a MapReduce approach utilizing on-demand resources from a Cloud is beneficial to perform simulation tasks in the area of Systems Biology and whether it can be seamlessly integrated into a service-oriented scientific workflow framework. In particular, an Amazon Elastic Map Reduce Cloud implementation of the 13C-MFA (Metabolix Flux Analysis) Monte Carlo bootstrap approach aimed at the integration into an existing BPEL-based scientific workflow system is presented. A comparison of a 64 node MapReduce cluster with a single node computation approach reveals a total performance gain up to a factor of 14, with a total cost for on-demand resources of 11. The most critical factor in terms of performance is I/O, i.e. our application suffers from the fact that I/O operations on many small files are expensive using Amazon S3 and the Hadoop DFS.</abstract><pub>IEEE</pub><doi>10.1109/eScience.2010.20</doi><tpages>8</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 1424489571
ispartof 2010 IEEE Sixth International Conference on e-Science, 2010, p.57-64
issn
language eng
recordid cdi_ieee_primary_5693899
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Algorithm design and analysis
Analytical models
Cloud computing
Computational modeling
Data models
Hadoop
MapReduce
MFA
Monte Carlo methods
Systems Biology
title Metabolic Flux Analysis in the Cloud
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T01%3A42%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Metabolic%20Flux%20Analysis%20in%20the%20Cloud&rft.btitle=2010%20IEEE%20Sixth%20International%20Conference%20on%20e-Science&rft.au=Dalman,%20Tolga&rft.date=2010-12&rft.spage=57&rft.epage=64&rft.pages=57-64&rft.isbn=1424489571&rft.isbn_list=9781424489572&rft_id=info:doi/10.1109/eScience.2010.20&rft_dat=%3Cieee_6IE%3E5693899%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9780769542904&rft.eisbn_list=0769542905&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5693899&rfr_iscdi=true