Mapping Datasets to Object Storage System

In 24th International Conference on Computing in High Energy & Nuclear Physics, Adelaide, Australia, November 4-8 2019 Access libraries such as ROOT and HDF5 allow users to interact with datasets using high level abstractions, like coordinate systems and associated slicing operations. Unfortunat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Xiaowei, Chu, LeFevre, Jeff, Montana, Aldrin, Robinson, Dana, Koziol, Quincey, Alvaro, Peter, Maltzahn, Carlos
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Xiaowei
Chu
LeFevre, Jeff
Montana, Aldrin
Robinson, Dana
Koziol, Quincey
Alvaro, Peter
Maltzahn, Carlos
description In 24th International Conference on Computing in High Energy & Nuclear Physics, Adelaide, Australia, November 4-8 2019 Access libraries such as ROOT and HDF5 allow users to interact with datasets using high level abstractions, like coordinate systems and associated slicing operations. Unfortunately, the implementations of access libraries are based on outdated assumptions about storage systems interfaces and are generally unable to fully benefit from modern fast storage devices. The situation is getting worse with rapidly evolving storage devices such as non-volatile memory and ever larger datasets. This project explores distributed dataset mapping infrastructures that can integrate and scale out existing access libraries using Ceph's extensible object model, avoiding re-implementation or even modifications of these access libraries as much as possible. These programmable storage extensions coupled with our distributed dataset mapping techniques enable: 1) access library operations to be offloaded to storage system servers, 2) the independent evolution of access libraries and storage systems and 3) fully leveraging of the existing load balancing, elasticity, and failure management of distributed storage systems like Ceph. They also create more opportunities to conduct storage server-local optimizations specific to storage servers. For example, storage servers might include local key/value stores combined with chunk stores that require different optimizations than a local file system. As storage servers evolve to support new storage devices like non-volatile memory, these server-local optimizations can be implemented while minimizing disruptions to applications. We will report progress on the means by which distributed dataset mapping can be abstracted over particular access libraries, including access libraries for ROOT data, and how we address some of the challenges revolving around data partitioning and composability of access operations.
doi_str_mv 10.48550/arxiv.2007.01789
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2007_01789</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2007_01789</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-52ece8c6d1506baff51aee2e6b83189c7b89763812eecdc129f03799d8ce6c343</originalsourceid><addsrcrecordid>eNotzrkOgkAUQNFpLIz6AVZOawHOwmylcU8wFtqTx_AgGBcCE6N_b1yq290cQsacxYlVis2gfdaPWDBmYsaNdX0y3UPT1LeKLiFAh6Gj4U4P-Rl9oMdwb6FCenx1Aa9D0ivh0uHo3wE5rVenxTZKD5vdYp5GoI2LlECP1uuCK6ZzKEvFAVGgzq3k1nmTW2e0tFwg-sJz4UomjXOF9ai9TOSATH7brzVr2voK7Sv7mLOvWb4BIrc7bw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Mapping Datasets to Object Storage System</title><source>arXiv.org</source><creator>Xiaowei ; Chu ; LeFevre, Jeff ; Montana, Aldrin ; Robinson, Dana ; Koziol, Quincey ; Alvaro, Peter ; Maltzahn, Carlos</creator><creatorcontrib>Xiaowei ; Chu ; LeFevre, Jeff ; Montana, Aldrin ; Robinson, Dana ; Koziol, Quincey ; Alvaro, Peter ; Maltzahn, Carlos</creatorcontrib><description>In 24th International Conference on Computing in High Energy &amp; Nuclear Physics, Adelaide, Australia, November 4-8 2019 Access libraries such as ROOT and HDF5 allow users to interact with datasets using high level abstractions, like coordinate systems and associated slicing operations. Unfortunately, the implementations of access libraries are based on outdated assumptions about storage systems interfaces and are generally unable to fully benefit from modern fast storage devices. The situation is getting worse with rapidly evolving storage devices such as non-volatile memory and ever larger datasets. This project explores distributed dataset mapping infrastructures that can integrate and scale out existing access libraries using Ceph's extensible object model, avoiding re-implementation or even modifications of these access libraries as much as possible. These programmable storage extensions coupled with our distributed dataset mapping techniques enable: 1) access library operations to be offloaded to storage system servers, 2) the independent evolution of access libraries and storage systems and 3) fully leveraging of the existing load balancing, elasticity, and failure management of distributed storage systems like Ceph. They also create more opportunities to conduct storage server-local optimizations specific to storage servers. For example, storage servers might include local key/value stores combined with chunk stores that require different optimizations than a local file system. As storage servers evolve to support new storage devices like non-volatile memory, these server-local optimizations can be implemented while minimizing disruptions to applications. We will report progress on the means by which distributed dataset mapping can be abstracted over particular access libraries, including access libraries for ROOT data, and how we address some of the challenges revolving around data partitioning and composability of access operations.</description><identifier>DOI: 10.48550/arxiv.2007.01789</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><creationdate>2020-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2007.01789$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2007.01789$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xiaowei</creatorcontrib><creatorcontrib>Chu</creatorcontrib><creatorcontrib>LeFevre, Jeff</creatorcontrib><creatorcontrib>Montana, Aldrin</creatorcontrib><creatorcontrib>Robinson, Dana</creatorcontrib><creatorcontrib>Koziol, Quincey</creatorcontrib><creatorcontrib>Alvaro, Peter</creatorcontrib><creatorcontrib>Maltzahn, Carlos</creatorcontrib><title>Mapping Datasets to Object Storage System</title><description>In 24th International Conference on Computing in High Energy &amp; Nuclear Physics, Adelaide, Australia, November 4-8 2019 Access libraries such as ROOT and HDF5 allow users to interact with datasets using high level abstractions, like coordinate systems and associated slicing operations. Unfortunately, the implementations of access libraries are based on outdated assumptions about storage systems interfaces and are generally unable to fully benefit from modern fast storage devices. The situation is getting worse with rapidly evolving storage devices such as non-volatile memory and ever larger datasets. This project explores distributed dataset mapping infrastructures that can integrate and scale out existing access libraries using Ceph's extensible object model, avoiding re-implementation or even modifications of these access libraries as much as possible. These programmable storage extensions coupled with our distributed dataset mapping techniques enable: 1) access library operations to be offloaded to storage system servers, 2) the independent evolution of access libraries and storage systems and 3) fully leveraging of the existing load balancing, elasticity, and failure management of distributed storage systems like Ceph. They also create more opportunities to conduct storage server-local optimizations specific to storage servers. For example, storage servers might include local key/value stores combined with chunk stores that require different optimizations than a local file system. As storage servers evolve to support new storage devices like non-volatile memory, these server-local optimizations can be implemented while minimizing disruptions to applications. We will report progress on the means by which distributed dataset mapping can be abstracted over particular access libraries, including access libraries for ROOT data, and how we address some of the challenges revolving around data partitioning and composability of access operations.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrkOgkAUQNFpLIz6AVZOawHOwmylcU8wFtqTx_AgGBcCE6N_b1yq290cQsacxYlVis2gfdaPWDBmYsaNdX0y3UPT1LeKLiFAh6Gj4U4P-Rl9oMdwb6FCenx1Aa9D0ivh0uHo3wE5rVenxTZKD5vdYp5GoI2LlECP1uuCK6ZzKEvFAVGgzq3k1nmTW2e0tFwg-sJz4UomjXOF9ai9TOSATH7brzVr2voK7Sv7mLOvWb4BIrc7bw</recordid><startdate>20200703</startdate><enddate>20200703</enddate><creator>Xiaowei</creator><creator>Chu</creator><creator>LeFevre, Jeff</creator><creator>Montana, Aldrin</creator><creator>Robinson, Dana</creator><creator>Koziol, Quincey</creator><creator>Alvaro, Peter</creator><creator>Maltzahn, Carlos</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200703</creationdate><title>Mapping Datasets to Object Storage System</title><author>Xiaowei ; Chu ; LeFevre, Jeff ; Montana, Aldrin ; Robinson, Dana ; Koziol, Quincey ; Alvaro, Peter ; Maltzahn, Carlos</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-52ece8c6d1506baff51aee2e6b83189c7b89763812eecdc129f03799d8ce6c343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiaowei</creatorcontrib><creatorcontrib>Chu</creatorcontrib><creatorcontrib>LeFevre, Jeff</creatorcontrib><creatorcontrib>Montana, Aldrin</creatorcontrib><creatorcontrib>Robinson, Dana</creatorcontrib><creatorcontrib>Koziol, Quincey</creatorcontrib><creatorcontrib>Alvaro, Peter</creatorcontrib><creatorcontrib>Maltzahn, Carlos</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiaowei</au><au>Chu</au><au>LeFevre, Jeff</au><au>Montana, Aldrin</au><au>Robinson, Dana</au><au>Koziol, Quincey</au><au>Alvaro, Peter</au><au>Maltzahn, Carlos</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mapping Datasets to Object Storage System</atitle><date>2020-07-03</date><risdate>2020</risdate><abstract>In 24th International Conference on Computing in High Energy &amp; Nuclear Physics, Adelaide, Australia, November 4-8 2019 Access libraries such as ROOT and HDF5 allow users to interact with datasets using high level abstractions, like coordinate systems and associated slicing operations. Unfortunately, the implementations of access libraries are based on outdated assumptions about storage systems interfaces and are generally unable to fully benefit from modern fast storage devices. The situation is getting worse with rapidly evolving storage devices such as non-volatile memory and ever larger datasets. This project explores distributed dataset mapping infrastructures that can integrate and scale out existing access libraries using Ceph's extensible object model, avoiding re-implementation or even modifications of these access libraries as much as possible. These programmable storage extensions coupled with our distributed dataset mapping techniques enable: 1) access library operations to be offloaded to storage system servers, 2) the independent evolution of access libraries and storage systems and 3) fully leveraging of the existing load balancing, elasticity, and failure management of distributed storage systems like Ceph. They also create more opportunities to conduct storage server-local optimizations specific to storage servers. For example, storage servers might include local key/value stores combined with chunk stores that require different optimizations than a local file system. As storage servers evolve to support new storage devices like non-volatile memory, these server-local optimizations can be implemented while minimizing disruptions to applications. We will report progress on the means by which distributed dataset mapping can be abstracted over particular access libraries, including access libraries for ROOT data, and how we address some of the challenges revolving around data partitioning and composability of access operations.</abstract><doi>10.48550/arxiv.2007.01789</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2007.01789
ispartof
issn
language eng
recordid cdi_arxiv_primary_2007_01789
source arXiv.org
subjects Computer Science - Distributed, Parallel, and Cluster Computing
title Mapping Datasets to Object Storage System
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T17%3A05%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mapping%20Datasets%20to%20Object%20Storage%20System&rft.au=Xiaowei&rft.date=2020-07-03&rft_id=info:doi/10.48550/arxiv.2007.01789&rft_dat=%3Carxiv_GOX%3E2007_01789%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true