Tupleware: Redefining Modern Analytics
There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of chea...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Crotty, Andrew Galakatos, Alex Dursun, Kayhan Kraska, Tim Cetintemel, Ugur Zdonik, Stan |
description | There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems. |
doi_str_mv | 10.48550/arxiv.1406.6667 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1406_6667</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1406_6667</sourcerecordid><originalsourceid>FETCH-LOGICAL-a657-a95a7b693164bafc274e813e8c29194bb7f157b973e5e6e8be3f57f25784f2cc3</originalsourceid><addsrcrecordid>eNotzjtvwjAUQGEvDBWwd6oydUuI48e12RDiJYGQUPbo2lxXltKAzKPl3yMe09mOPsY-eVlIo1Q5wvQfrwWXpS601vDBvuvLsaU_TDTOdrSnELvY_WSbw55Sl006bG_n6E8D1gvYnmj4bp_V81k9Xebr7WI1naxz1ApytArBaSu4lg6Dr0CS4YKMryy30jkIXIGzIEiRJuNIBAWhUmBkqLwXffb12j6dzTHFX0y35uFtHl5xB-a7OdU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Tupleware: Redefining Modern Analytics</title><source>arXiv.org</source><creator>Crotty, Andrew ; Galakatos, Alex ; Dursun, Kayhan ; Kraska, Tim ; Cetintemel, Ugur ; Zdonik, Stan</creator><creatorcontrib>Crotty, Andrew ; Galakatos, Alex ; Dursun, Kayhan ; Kraska, Tim ; Cetintemel, Ugur ; Zdonik, Stan</creatorcontrib><description>There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems.</description><identifier>DOI: 10.48550/arxiv.1406.6667</identifier><language>eng</language><subject>Computer Science - Databases</subject><creationdate>2014-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1406.6667$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1406.6667$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Crotty, Andrew</creatorcontrib><creatorcontrib>Galakatos, Alex</creatorcontrib><creatorcontrib>Dursun, Kayhan</creatorcontrib><creatorcontrib>Kraska, Tim</creatorcontrib><creatorcontrib>Cetintemel, Ugur</creatorcontrib><creatorcontrib>Zdonik, Stan</creatorcontrib><title>Tupleware: Redefining Modern Analytics</title><description>There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems.</description><subject>Computer Science - Databases</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzjtvwjAUQGEvDBWwd6oydUuI48e12RDiJYGQUPbo2lxXltKAzKPl3yMe09mOPsY-eVlIo1Q5wvQfrwWXpS601vDBvuvLsaU_TDTOdrSnELvY_WSbw55Sl006bG_n6E8D1gvYnmj4bp_V81k9Xebr7WI1naxz1ApytArBaSu4lg6Dr0CS4YKMryy30jkIXIGzIEiRJuNIBAWhUmBkqLwXffb12j6dzTHFX0y35uFtHl5xB-a7OdU</recordid><startdate>20140625</startdate><enddate>20140625</enddate><creator>Crotty, Andrew</creator><creator>Galakatos, Alex</creator><creator>Dursun, Kayhan</creator><creator>Kraska, Tim</creator><creator>Cetintemel, Ugur</creator><creator>Zdonik, Stan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20140625</creationdate><title>Tupleware: Redefining Modern Analytics</title><author>Crotty, Andrew ; Galakatos, Alex ; Dursun, Kayhan ; Kraska, Tim ; Cetintemel, Ugur ; Zdonik, Stan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a657-a95a7b693164bafc274e813e8c29194bb7f157b973e5e6e8be3f57f25784f2cc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Computer Science - Databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Crotty, Andrew</creatorcontrib><creatorcontrib>Galakatos, Alex</creatorcontrib><creatorcontrib>Dursun, Kayhan</creatorcontrib><creatorcontrib>Kraska, Tim</creatorcontrib><creatorcontrib>Cetintemel, Ugur</creatorcontrib><creatorcontrib>Zdonik, Stan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Crotty, Andrew</au><au>Galakatos, Alex</au><au>Dursun, Kayhan</au><au>Kraska, Tim</au><au>Cetintemel, Ugur</au><au>Zdonik, Stan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Tupleware: Redefining Modern Analytics</atitle><date>2014-06-25</date><risdate>2014</risdate><abstract>There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems.</abstract><doi>10.48550/arxiv.1406.6667</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1406.6667 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1406_6667 |
source | arXiv.org |
subjects | Computer Science - Databases |
title | Tupleware: Redefining Modern Analytics |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T10%3A27%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Tupleware:%20Redefining%20Modern%20Analytics&rft.au=Crotty,%20Andrew&rft.date=2014-06-25&rft_id=info:doi/10.48550/arxiv.1406.6667&rft_dat=%3Carxiv_GOX%3E1406_6667%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |