Controlling AWS Costs with Data Carousel

How to manage the costs associated with a 2.4 Petabyte dataset hosted on AWS? This is a question posed by the EOSDIS Large, Mission Scale Data working group. Part of the answer lies in keeping the data in low-cost Glacier storage, however unbounded data retrieval costs are incompatible with federal...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Galewsky, Benjamin, Petravick, Donald, Daues, Greg, Readey, John, Kolak, Ryan
Format: Video
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Galewsky, Benjamin
Petravick, Donald
Daues, Greg
Readey, John
Kolak, Ryan
description How to manage the costs associated with a 2.4 Petabyte dataset hosted on AWS? This is a question posed by the EOSDIS Large, Mission Scale Data working group. Part of the answer lies in keeping the data in low-cost Glacier storage, however unbounded data retrieval costs are incompatible with federal budget rules. We will describe and demonstrate a data carousel model where data is restored on a fixed regular schedule and research jobs are run against the data before it is again placed in cold storage. This provides a bounded, fixed cost to NASA to operate, and allows the researchers to scale their analysis as their budgets and needs permit. This presentation was given at the Earth Science Information Partners (ESIP) Summer Meeting held online in July 2020.
doi_str_mv 10.6084/m9.figshare.12690038
format Video
fullrecord <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_6084_m9_figshare_12690038</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_6084_m9_figshare_12690038</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_6084_m9_figshare_126900383</originalsourceid><addsrcrecordid>eNpjYJAxNNAzM7Aw0c-11EvLTC_OSCxK1TM0MrM0MDC24GTQcM7PKynKz8nJzEtXcAwPVnDOLy4pVijPLMlQcEksSVRwTizKLy1OzeFhYE1LzClO5YXS3Awmbq4hzh66KUBVyZklqfEFRZm5iUWV8YYG8SAL43Mt42EWxsMsNCZTGwBdvz0f</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>video</recordtype></control><display><type>video</type><title>Controlling AWS Costs with Data Carousel</title><source>DataCite</source><creator>Galewsky, Benjamin ; Petravick, Donald ; Daues, Greg ; Readey, John ; Kolak, Ryan</creator><creatorcontrib>Galewsky, Benjamin ; Petravick, Donald ; Daues, Greg ; Readey, John ; Kolak, Ryan</creatorcontrib><description>How to manage the costs associated with a 2.4 Petabyte dataset hosted on AWS? This is a question posed by the EOSDIS Large, Mission Scale Data working group. Part of the answer lies in keeping the data in low-cost Glacier storage, however unbounded data retrieval costs are incompatible with federal budget rules. We will describe and demonstrate a data carousel model where data is restored on a fixed regular schedule and research jobs are run against the data before it is again placed in cold storage. This provides a bounded, fixed cost to NASA to operate, and allows the researchers to scale their analysis as their budgets and needs permit. This presentation was given at the Earth Science Information Partners (ESIP) Summer Meeting held online in July 2020.</description><identifier>DOI: 10.6084/m9.figshare.12690038</identifier><language>eng</language><publisher>ESIP</publisher><subject>Climate Science ; FOS: Electrical engineering, electronic engineering, information engineering ; Input, Output and Data Devices</subject><creationdate>2020</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,1894</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.6084/m9.figshare.12690038$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Galewsky, Benjamin</creatorcontrib><creatorcontrib>Petravick, Donald</creatorcontrib><creatorcontrib>Daues, Greg</creatorcontrib><creatorcontrib>Readey, John</creatorcontrib><creatorcontrib>Kolak, Ryan</creatorcontrib><title>Controlling AWS Costs with Data Carousel</title><description>How to manage the costs associated with a 2.4 Petabyte dataset hosted on AWS? This is a question posed by the EOSDIS Large, Mission Scale Data working group. Part of the answer lies in keeping the data in low-cost Glacier storage, however unbounded data retrieval costs are incompatible with federal budget rules. We will describe and demonstrate a data carousel model where data is restored on a fixed regular schedule and research jobs are run against the data before it is again placed in cold storage. This provides a bounded, fixed cost to NASA to operate, and allows the researchers to scale their analysis as their budgets and needs permit. This presentation was given at the Earth Science Information Partners (ESIP) Summer Meeting held online in July 2020.</description><subject>Climate Science</subject><subject>FOS: Electrical engineering, electronic engineering, information engineering</subject><subject>Input, Output and Data Devices</subject><fulltext>true</fulltext><rsrctype>video</rsrctype><creationdate>2020</creationdate><recordtype>video</recordtype><sourceid>PQ8</sourceid><recordid>eNpjYJAxNNAzM7Aw0c-11EvLTC_OSCxK1TM0MrM0MDC24GTQcM7PKynKz8nJzEtXcAwPVnDOLy4pVijPLMlQcEksSVRwTizKLy1OzeFhYE1LzClO5YXS3Awmbq4hzh66KUBVyZklqfEFRZm5iUWV8YYG8SAL43Mt42EWxsMsNCZTGwBdvz0f</recordid><startdate>20200722</startdate><enddate>20200722</enddate><creator>Galewsky, Benjamin</creator><creator>Petravick, Donald</creator><creator>Daues, Greg</creator><creator>Readey, John</creator><creator>Kolak, Ryan</creator><general>ESIP</general><scope>DYCCY</scope><scope>PQ8</scope></search><sort><creationdate>20200722</creationdate><title>Controlling AWS Costs with Data Carousel</title><author>Galewsky, Benjamin ; Petravick, Donald ; Daues, Greg ; Readey, John ; Kolak, Ryan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_6084_m9_figshare_126900383</frbrgroupid><rsrctype>videos</rsrctype><prefilter>videos</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Climate Science</topic><topic>FOS: Electrical engineering, electronic engineering, information engineering</topic><topic>Input, Output and Data Devices</topic><toplevel>online_resources</toplevel><creatorcontrib>Galewsky, Benjamin</creatorcontrib><creatorcontrib>Petravick, Donald</creatorcontrib><creatorcontrib>Daues, Greg</creatorcontrib><creatorcontrib>Readey, John</creatorcontrib><creatorcontrib>Kolak, Ryan</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Galewsky, Benjamin</au><au>Petravick, Donald</au><au>Daues, Greg</au><au>Readey, John</au><au>Kolak, Ryan</au><genre>unknown</genre><ristype>VIDEO</ristype><title>Controlling AWS Costs with Data Carousel</title><date>2020-07-22</date><risdate>2020</risdate><abstract>How to manage the costs associated with a 2.4 Petabyte dataset hosted on AWS? This is a question posed by the EOSDIS Large, Mission Scale Data working group. Part of the answer lies in keeping the data in low-cost Glacier storage, however unbounded data retrieval costs are incompatible with federal budget rules. We will describe and demonstrate a data carousel model where data is restored on a fixed regular schedule and research jobs are run against the data before it is again placed in cold storage. This provides a bounded, fixed cost to NASA to operate, and allows the researchers to scale their analysis as their budgets and needs permit. This presentation was given at the Earth Science Information Partners (ESIP) Summer Meeting held online in July 2020.</abstract><pub>ESIP</pub><doi>10.6084/m9.figshare.12690038</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.6084/m9.figshare.12690038
ispartof
issn
language eng
recordid cdi_datacite_primary_10_6084_m9_figshare_12690038
source DataCite
subjects Climate Science
FOS: Electrical engineering, electronic engineering, information engineering
Input, Output and Data Devices
title Controlling AWS Costs with Data Carousel
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T07%3A30%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:&rft.genre=unknown&rft.au=Galewsky,%20Benjamin&rft.date=2020-07-22&rft_id=info:doi/10.6084/m9.figshare.12690038&rft_dat=%3Cdatacite_PQ8%3E10_6084_m9_figshare_12690038%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true