T2: a customizable parallel database for multi-dimensional data
As computational power and storage capacity increase, processing and analyzing large volumes of data play an increasingly important part in many domains of scientific research. Typical examples of large scientific datasets include long running simulations of time-dependent phenomena that periodicall...
Gespeichert in:
Veröffentlicht in: | SIGMOD record 1998-03, Vol.27 (1), p.58-66 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | As computational power and storage capacity increase, processing
and analyzing large volumes of data play an increasingly important
part in many domains of scientific research. Typical examples of
large scientific datasets include long running simulations of
time-dependent phenomena that periodically generate snapshots of
their state (e.g. hydrodynamics and chemical transport simulation
for estimating pollution impact on water bodies [4, 6, 20],
magnetohydrodynamics simulation of planetary magnetospheres [32],
simulation of a flame sweeping through a volume [28], airplane wake
simulations [21]), archives of raw and processed remote sensing
data (e.g. AVHRR [25], Thematic Mapper [17], MODIS [22]), and
archives of medical images (e.g. confocal light microscopy, CT
imaging, MRI, sonography).
These datasets are usually multi-dimensional. The data
dimensions can be spatial coordinates, time, or experimental
conditions such as temperature, velocity or magnetic field. The
importance of such datasets has been recognized by several database
research groups and vendors, and several systems have been
developed for managing and/or visualizing them [2, 7, 14, 19, 26,
27, 29, 31].
These systems, however, focus on lineage management, retrieval
and visualization of multi-dimensional datasets. They provide
little or no support for analyzing or processing these datasets --
the assumption is that this is too application-specific to warrant
common support. As a result, applications that process these
datasets are usually decoupled from data storage and management,
resulting in inefficiency due to copying and loss of locality.
Furthermore, every application developer has to implement complex
support for managing and scheduling the processing.
Over the past three years, we have been working with several
scientific research groups to understand the processing
requirements for such applications [1, 5, 6, 10, 18, 23, 24, 28].
Our study of a large set of applications indicates that the
processing for such datasets is often highly stylized and shares
several important characteristics. Usually, both the input dataset
as well as the result being computed have underlying
multi-dimensional grids, and queries into the dataset are in the
form of ranges within each dimension of the grid. The basic
processing step usually consists of transforming individual input
items, mapping the transformed items to the output grid and
computing output items by aggregating, in some way, all the
transform |
---|---|
ISSN: | 0163-5808 |
DOI: | 10.1145/273244.273264 |