Tupleware: Redefining Modern Analytics

There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of chea...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Crotty, Andrew, Galakatos, Alex, Dursun, Kayhan, Kraska, Tim, Cetintemel, Ugur, Zdonik, Stan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Crotty, Andrew
Galakatos, Alex
Dursun, Kayhan
Kraska, Tim
Cetintemel, Ugur
Zdonik, Stan
description There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to a few terabytes, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes the design of Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware's architecture brings together ideas from the database, compiler, and programming languages communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis. Our experimental evaluation quantifies the impact of our novel techniques and shows orders of magnitude performance improvement over alternative systems.
doi_str_mv 10.48550/arxiv.1406.6667
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1406_6667</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1406_6667</sourcerecordid><originalsourceid>FETCH-LOGICAL-a657-a95a7b693164bafc274e813e8c29194bb7f157b973e5e6e8be3f57f25784f2cc3</originalsourceid><addsrcrecordid>eNotzjtvwjAUQGEvDBWwd6oydUuI48e12RDiJYGQUPbo2lxXltKAzKPl3yMe09mOPsY-eVlIo1Q5wvQfrwWXpS601vDBvuvLsaU_TDTOdrSnELvY_WSbw55Sl006bG_n6E8D1gvYnmj4bp_V81k9Xebr7WI1naxz1ApytArBaSu4lg6Dr0CS4YKMryy30jkIXIGzIEiRJuNIBAWhUmBkqLwXffb12j6dzTHFX0y35uFtHl5xB-a7OdU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Tupleware: Redefining Modern Analytics</title><source>arXiv.org</source><creator>Crotty, Andrew ; Galakatos, Alex ; Dursun, Kayhan ; Kraska, Tim ; Cetintemel, Ugur ; Zdonik, Stan</creator><creatorcontrib>Crotty, Andrew ; Galakatos, Alex ; Dursun, Kayhan ; Kraska, Tim ; Cetintemel, Ugur ; Zdonik, Stan</creatorcontrib><description>There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to a few terabytes, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes the design of Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware's architecture brings together ideas from the database, compiler, and programming languages communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis. Our experimental evaluation quantifies the impact of our novel techniques and shows orders of magnitude performance improvement over alternative systems.</description><identifier>DOI: 10.48550/arxiv.1406.6667</identifier><language>eng</language><subject>Computer Science - Databases</subject><creationdate>2014-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1406.6667$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1406.6667$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Crotty, Andrew</creatorcontrib><creatorcontrib>Galakatos, Alex</creatorcontrib><creatorcontrib>Dursun, Kayhan</creatorcontrib><creatorcontrib>Kraska, Tim</creatorcontrib><creatorcontrib>Cetintemel, Ugur</creatorcontrib><creatorcontrib>Zdonik, Stan</creatorcontrib><title>Tupleware: Redefining Modern Analytics</title><description>There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to a few terabytes, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes the design of Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware's architecture brings together ideas from the database, compiler, and programming languages communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis. Our experimental evaluation quantifies the impact of our novel techniques and shows orders of magnitude performance improvement over alternative systems.</description><subject>Computer Science - Databases</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzjtvwjAUQGEvDBWwd6oydUuI48e12RDiJYGQUPbo2lxXltKAzKPl3yMe09mOPsY-eVlIo1Q5wvQfrwWXpS601vDBvuvLsaU_TDTOdrSnELvY_WSbw55Sl006bG_n6E8D1gvYnmj4bp_V81k9Xebr7WI1naxz1ApytArBaSu4lg6Dr0CS4YKMryy30jkIXIGzIEiRJuNIBAWhUmBkqLwXffb12j6dzTHFX0y35uFtHl5xB-a7OdU</recordid><startdate>20140625</startdate><enddate>20140625</enddate><creator>Crotty, Andrew</creator><creator>Galakatos, Alex</creator><creator>Dursun, Kayhan</creator><creator>Kraska, Tim</creator><creator>Cetintemel, Ugur</creator><creator>Zdonik, Stan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20140625</creationdate><title>Tupleware: Redefining Modern Analytics</title><author>Crotty, Andrew ; Galakatos, Alex ; Dursun, Kayhan ; Kraska, Tim ; Cetintemel, Ugur ; Zdonik, Stan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a657-a95a7b693164bafc274e813e8c29194bb7f157b973e5e6e8be3f57f25784f2cc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Computer Science - Databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Crotty, Andrew</creatorcontrib><creatorcontrib>Galakatos, Alex</creatorcontrib><creatorcontrib>Dursun, Kayhan</creatorcontrib><creatorcontrib>Kraska, Tim</creatorcontrib><creatorcontrib>Cetintemel, Ugur</creatorcontrib><creatorcontrib>Zdonik, Stan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Crotty, Andrew</au><au>Galakatos, Alex</au><au>Dursun, Kayhan</au><au>Kraska, Tim</au><au>Cetintemel, Ugur</au><au>Zdonik, Stan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Tupleware: Redefining Modern Analytics</atitle><date>2014-06-25</date><risdate>2014</risdate><abstract>There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to a few terabytes, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes the design of Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware's architecture brings together ideas from the database, compiler, and programming languages communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis. Our experimental evaluation quantifies the impact of our novel techniques and shows orders of magnitude performance improvement over alternative systems.</abstract><doi>10.48550/arxiv.1406.6667</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1406.6667
ispartof
issn
language eng
recordid cdi_arxiv_primary_1406_6667
source arXiv.org
subjects Computer Science - Databases
title Tupleware: Redefining Modern Analytics
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T10%3A27%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Tupleware:%20Redefining%20Modern%20Analytics&rft.au=Crotty,%20Andrew&rft.date=2014-06-25&rft_id=info:doi/10.48550/arxiv.1406.6667&rft_dat=%3Carxiv_GOX%3E1406_6667%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true