Supporting XML Based High-Level Abstractions on HDF5 Datasets: A Case Study in Automatic Data Virtualization

Recently, we have been focusing on the notion of automatic data virtualization. The goal is to enable automatic creation of efficient data services to support a high-level or virtual view of the data. The application developers express the processing assuming this virtual view, whereas the data is s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Sahoo, Swarup Kumar, Agrawal, Gagan
Format: Buchkapitel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recently, we have been focusing on the notion of automatic data virtualization. The goal is to enable automatic creation of efficient data services to support a high-level or virtual view of the data. The application developers express the processing assuming this virtual view, whereas the data is stored in a low-level format. The compiler uses the information about the low-level layout and the relationship between the virtual and the low-level layouts to generate efficient low-level data processing code. In this paper, we describe a specific implementation of this approach. We provide XML-based abstractions on datasets stored in the Hierarchical Data Format (HDF). A high-level XML Schema provides a logical view on the HDF5 dataset, hiding actual layout details. Based on this view, the processing is specified using XQuery, which is the XML Query language developed by the World Wide Web Consortium (W3C). The HDF5 data layout is exposed to the compiler using low-level XML Schema. The relationship between the high-level and low-level Schemas is exposed using a Mapping Schema. We describe how our compiler can generate efficient code to access and process HDF5 datasets using the above information. A number of issues are addressed for ensuring high locality in processing of the datasets, which arise mainly because of the high-level nature of XQuery and because the actual data layout is abstracted.
ISSN:0302-9743
1611-3349
DOI:10.1007/11532378_22