epiC: an extensible and scalable system for processing big data

The Big Data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the Big Data problem are largely based on...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the VLDB Endowment 2014-03, Vol.7 (7), p.541-552
Hauptverfasser:	Jiang, Dawei, Chen, Gang, Ooi, Beng Chin, Tan, Kian-Lee, Wu, Sai
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	552
container_issue	7
container_start_page	541
container_title	Proceedings of the VLDB Endowment
container_volume	7
creator	Jiang, Dawei Chen, Gang Ooi, Beng Chin Tan, Kian-Lee Wu, Sai
description	The Big Data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the Big Data problem are largely based on the MapReduce framework (aka its open source implementation Hadoop). Although Hadoop handles the data volume challenge successfully, it does not deal with the data variety well since the programming interfaces and its associated data processing model is inconvenient and inefficient for handling structured data and graph data. This paper presents epiC , an extensible system to tackle the Big Data's data variety challenge. epiC introduces a general Actor-like concurrent programming model, independent of the data processing models, for specifying parallel computations. Users process multi-structured datasets with appropriate epiC extensions, the implementation of a data processing model best suited for the data type and auxiliary code for mapping that data processing model into epiC 's concurrent programming model. Like Hadoop, programs written in this way can be automatically parallelized and the runtime system takes care of fault tolerance and inter-machine communications. We present the design and implementation of epiC 's concurrent programming model. We also present two customized data processing model, an optimized MapReduce extension and a relational model, on top of epiC. Experiments demonstrate the effectiveness and efficiency of our proposed epiC.
doi_str_mv	10.14778/2732286.2732291
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_2732286_2732291</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_2732286_2732291</sourcerecordid><originalsourceid>FETCH-LOGICAL-c196t-316586b1c0dfb298e15c105d922aa231a743d169f2b2901cf61e633df37dd10c3</originalsourceid><addsrcrecordid>eNpNj0trwkAURgexYJp2759Ieh_mzsxSgn2A4EbXw2QeYFGUjJv--4rNoqvzwQcHjlJLhBZXWps30kxkpH3Q4kxVhB00Bqye_9sL9VzKN4AYQVOpeboe-xf1lP2ppNeJtTq8b_b9Z7PdfXz1620T0MqtYZTOyIABYh7ImoRdQOiiJfKeGL1ecUSxme4vYMiCSZhjZh0jQuBawZ83jJdSxpTddTye_fjjENyjwk0VbqrgXwWbNr4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>epiC: an extensible and scalable system for processing big data</title><source>ACM Digital Library Complete</source><creator>Jiang, Dawei ; Chen, Gang ; Ooi, Beng Chin ; Tan, Kian-Lee ; Wu, Sai</creator><creatorcontrib>Jiang, Dawei ; Chen, Gang ; Ooi, Beng Chin ; Tan, Kian-Lee ; Wu, Sai</creatorcontrib><description>The Big Data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the Big Data problem are largely based on the MapReduce framework (aka its open source implementation Hadoop). Although Hadoop handles the data volume challenge successfully, it does not deal with the data variety well since the programming interfaces and its associated data processing model is inconvenient and inefficient for handling structured data and graph data. This paper presents epiC , an extensible system to tackle the Big Data's data variety challenge. epiC introduces a general Actor-like concurrent programming model, independent of the data processing models, for specifying parallel computations. Users process multi-structured datasets with appropriate epiC extensions, the implementation of a data processing model best suited for the data type and auxiliary code for mapping that data processing model into epiC 's concurrent programming model. Like Hadoop, programs written in this way can be automatically parallelized and the runtime system takes care of fault tolerance and inter-machine communications. We present the design and implementation of epiC 's concurrent programming model. We also present two customized data processing model, an optimized MapReduce extension and a relational model, on top of epiC. Experiments demonstrate the effectiveness and efficiency of our proposed epiC.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/2732286.2732291</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2014-03, Vol.7 (7), p.541-552</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c196t-316586b1c0dfb298e15c105d922aa231a743d169f2b2901cf61e633df37dd10c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,778,782,27907,27908</link.rule.ids></links><search><creatorcontrib>Jiang, Dawei</creatorcontrib><creatorcontrib>Chen, Gang</creatorcontrib><creatorcontrib>Ooi, Beng Chin</creatorcontrib><creatorcontrib>Tan, Kian-Lee</creatorcontrib><creatorcontrib>Wu, Sai</creatorcontrib><title>epiC: an extensible and scalable system for processing big data</title><title>Proceedings of the VLDB Endowment</title><description>The Big Data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the Big Data problem are largely based on the MapReduce framework (aka its open source implementation Hadoop). Although Hadoop handles the data volume challenge successfully, it does not deal with the data variety well since the programming interfaces and its associated data processing model is inconvenient and inefficient for handling structured data and graph data. This paper presents epiC , an extensible system to tackle the Big Data's data variety challenge. epiC introduces a general Actor-like concurrent programming model, independent of the data processing models, for specifying parallel computations. Users process multi-structured datasets with appropriate epiC extensions, the implementation of a data processing model best suited for the data type and auxiliary code for mapping that data processing model into epiC 's concurrent programming model. Like Hadoop, programs written in this way can be automatically parallelized and the runtime system takes care of fault tolerance and inter-machine communications. We present the design and implementation of epiC 's concurrent programming model. We also present two customized data processing model, an optimized MapReduce extension and a relational model, on top of epiC. Experiments demonstrate the effectiveness and efficiency of our proposed epiC.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNpNj0trwkAURgexYJp2759Ieh_mzsxSgn2A4EbXw2QeYFGUjJv--4rNoqvzwQcHjlJLhBZXWps30kxkpH3Q4kxVhB00Bqye_9sL9VzKN4AYQVOpeboe-xf1lP2ppNeJtTq8b_b9Z7PdfXz1620T0MqtYZTOyIABYh7ImoRdQOiiJfKeGL1ecUSxme4vYMiCSZhjZh0jQuBawZ83jJdSxpTddTye_fjjENyjwk0VbqrgXwWbNr4</recordid><startdate>20140301</startdate><enddate>20140301</enddate><creator>Jiang, Dawei</creator><creator>Chen, Gang</creator><creator>Ooi, Beng Chin</creator><creator>Tan, Kian-Lee</creator><creator>Wu, Sai</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20140301</creationdate><title>epiC</title><author>Jiang, Dawei ; Chen, Gang ; Ooi, Beng Chin ; Tan, Kian-Lee ; Wu, Sai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c196t-316586b1c0dfb298e15c105d922aa231a743d169f2b2901cf61e633df37dd10c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Dawei</creatorcontrib><creatorcontrib>Chen, Gang</creatorcontrib><creatorcontrib>Ooi, Beng Chin</creatorcontrib><creatorcontrib>Tan, Kian-Lee</creatorcontrib><creatorcontrib>Wu, Sai</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jiang, Dawei</au><au>Chen, Gang</au><au>Ooi, Beng Chin</au><au>Tan, Kian-Lee</au><au>Wu, Sai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>epiC: an extensible and scalable system for processing big data</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2014-03-01</date><risdate>2014</risdate><volume>7</volume><issue>7</issue><spage>541</spage><epage>552</epage><pages>541-552</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>The Big Data problem is characterized by the so called 3V features: Volume - a huge amount of data, Velocity - a high data ingestion rate, and Variety - a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the Big Data problem are largely based on the MapReduce framework (aka its open source implementation Hadoop). Although Hadoop handles the data volume challenge successfully, it does not deal with the data variety well since the programming interfaces and its associated data processing model is inconvenient and inefficient for handling structured data and graph data. This paper presents epiC , an extensible system to tackle the Big Data's data variety challenge. epiC introduces a general Actor-like concurrent programming model, independent of the data processing models, for specifying parallel computations. Users process multi-structured datasets with appropriate epiC extensions, the implementation of a data processing model best suited for the data type and auxiliary code for mapping that data processing model into epiC 's concurrent programming model. Like Hadoop, programs written in this way can be automatically parallelized and the runtime system takes care of fault tolerance and inter-machine communications. We present the design and implementation of epiC 's concurrent programming model. We also present two customized data processing model, an optimized MapReduce extension and a relational model, on top of epiC. Experiments demonstrate the effectiveness and efficiency of our proposed epiC.</abstract><doi>10.14778/2732286.2732291</doi><tpages>12</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 2150-8097
ispartof	Proceedings of the VLDB Endowment, 2014-03, Vol.7 (7), p.541-552
issn	2150-8097 2150-8097
language	eng
recordid	cdi_crossref_primary_10_14778_2732286_2732291
source	ACM Digital Library Complete
title	epiC: an extensible and scalable system for processing big data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T23%3A09%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=epiC:%20an%20extensible%20and%20scalable%20system%20for%20processing%20big%20data&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Jiang,%20Dawei&rft.date=2014-03-01&rft.volume=7&rft.issue=7&rft.spage=541&rft.epage=552&rft.pages=541-552&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/2732286.2732291&rft_dat=%3Ccrossref%3E10_14778_2732286_2732291%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true