Mapping Datasets to Object Storage System
In 24th International Conference on Computing in High Energy & Nuclear Physics, Adelaide, Australia, November 4-8 2019 Access libraries such as ROOT and HDF5 allow users to interact with datasets using high level abstractions, like coordinate systems and associated slicing operations. Unfortunat...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In 24th International Conference on Computing in High Energy &
Nuclear Physics, Adelaide, Australia, November 4-8 2019 Access libraries such as ROOT and HDF5 allow users to interact with datasets
using high level abstractions, like coordinate systems and associated slicing
operations. Unfortunately, the implementations of access libraries are based on
outdated assumptions about storage systems interfaces and are generally unable
to fully benefit from modern fast storage devices. The situation is getting
worse with rapidly evolving storage devices such as non-volatile memory and
ever larger datasets. This project explores distributed dataset mapping
infrastructures that can integrate and scale out existing access libraries
using Ceph's extensible object model, avoiding re-implementation or even
modifications of these access libraries as much as possible. These programmable
storage extensions coupled with our distributed dataset mapping techniques
enable: 1) access library operations to be offloaded to storage system servers,
2) the independent evolution of access libraries and storage systems and 3)
fully leveraging of the existing load balancing, elasticity, and failure
management of distributed storage systems like Ceph. They also create more
opportunities to conduct storage server-local optimizations specific to storage
servers. For example, storage servers might include local key/value stores
combined with chunk stores that require different optimizations than a local
file system. As storage servers evolve to support new storage devices like
non-volatile memory, these server-local optimizations can be implemented while
minimizing disruptions to applications. We will report progress on the means by
which distributed dataset mapping can be abstracted over particular access
libraries, including access libraries for ROOT data, and how we address some of
the challenges revolving around data partitioning and composability of access
operations. |
---|---|
DOI: | 10.48550/arxiv.2007.01789 |