epiC: an extensible and scalable system for processing big data
The Big Data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the Big Data problem are largely based on...
Gespeichert in:
Veröffentlicht in: | Proceedings of the VLDB Endowment 2014-03, Vol.7 (7), p.541-552 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Big Data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the Big Data problem are largely based on the MapReduce framework (aka its open source implementation Hadoop). Although Hadoop handles the data volume challenge successfully, it does not deal with the data variety well since the programming interfaces and its associated data processing model is inconvenient and inefficient for handling structured data and graph data.
This paper presents
epiC
, an extensible system to tackle the Big Data's data variety challenge.
epiC
introduces a general Actor-like concurrent programming model, independent of the data processing models, for specifying parallel computations. Users process multi-structured datasets with appropriate
epiC
extensions, the implementation of a data processing model best suited for the data type and auxiliary code for mapping that data processing model into
epiC
's concurrent programming model. Like Hadoop, programs written in this way can be automatically parallelized and the runtime system takes care of fault tolerance and inter-machine communications. We present the design and implementation of
epiC
's concurrent programming model. We also present two customized data processing model, an optimized MapReduce extension and a relational model, on top of
epiC.
Experiments demonstrate the effectiveness and efficiency of our proposed
epiC. |
---|---|
ISSN: | 2150-8097 2150-8097 |
DOI: | 10.14778/2732286.2732291 |