Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees

When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a two-phase execution strategy for numerical computations that are expr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2021-03, Vol.14 (7), p.1228-1240
Hauptverfasser: Jankov, Dimitrije, Yuan, Binhang, Luo, Shangyu, Jermaine, Chris
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1240
container_issue 7
container_start_page 1228
container_title Proceedings of the VLDB Endowment
container_volume 14
creator Jankov, Dimitrije
Yuan, Binhang
Luo, Shangyu
Jermaine, Chris
description When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a two-phase execution strategy for numerical computations that are expressed relationally, as aggregated join trees (that is, expressed as a series of relational joins followed by an aggregation). In a pilot run, lineage information is collected; this lineage is used to optimally plan the computation at the level of individual records. Then, the computation is actually executed. We show experimentally that a relational system making use of this two-phase strategy can be an excellent platform for distributed ML computations.
doi_str_mv 10.14778/3450980.3450991
format Article
fullrecord <record><control><sourceid>webofscience_cross</sourceid><recordid>TN_cdi_webofscience_primary_000658497300012</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>000658497300012</sourcerecordid><originalsourceid>FETCH-LOGICAL-c243t-5ad7ace35a3c06470d880bd99e077ffe0cf565f594b003dee5e0153accd010d03</originalsourceid><addsrcrecordid>eNqNkDtPwzAUhS0EEqWwM3pHges6jp2xCuWl8hjKHDn2TWvUxpWdUPj3pA8hRqZzpPMYPkIuGVyzVEp1w1MBuYLrnebsiAxGTECiIJfHf_wpOYvxAyBTGVMDsrh1sQ2u6lq09KVbYXBGL6luLH3WZuEapFPUoXHNnBZ-te5a3TrfRPrpNJ1tfPK20BHp5AtNtw2or-l4Pg8419vHJ-8aOguI8Zyc1HoZ8eKgQ_J-N5kVD8n09f6xGE8TM0p5mwhtpTbIheYGslSCVQoqm-cIUtY1gqlFJmqRpxUAt4gCgQmujbHAwAIfEtj_muBjDFiX6-BWOnyXDModqfJAqjyQ6idX-8kGK19H47Ax-DuDnpVQaS5579iob6v_twu351X4rmn5D05qfaM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees</title><source>Web of Science - Science Citation Index Expanded - 2021&lt;img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /&gt;</source><source>ACM Digital Library</source><creator>Jankov, Dimitrije ; Yuan, Binhang ; Luo, Shangyu ; Jermaine, Chris</creator><creatorcontrib>Jankov, Dimitrije ; Yuan, Binhang ; Luo, Shangyu ; Jermaine, Chris</creatorcontrib><description>When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a two-phase execution strategy for numerical computations that are expressed relationally, as aggregated join trees (that is, expressed as a series of relational joins followed by an aggregation). In a pilot run, lineage information is collected; this lineage is used to optimally plan the computation at the level of individual records. Then, the computation is actually executed. We show experimentally that a relational system making use of this two-phase strategy can be an excellent platform for distributed ML computations.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/3450980.3450991</identifier><language>eng</language><publisher>NEW YORK: Assoc Computing Machinery</publisher><subject>Computer Science ; Computer Science, Information Systems ; Computer Science, Theory &amp; Methods ; Science &amp; Technology ; Technology</subject><ispartof>Proceedings of the VLDB Endowment, 2021-03, Vol.14 (7), p.1228-1240</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>5</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000658497300012</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c243t-5ad7ace35a3c06470d880bd99e077ffe0cf565f594b003dee5e0153accd010d03</citedby><cites>FETCH-LOGICAL-c243t-5ad7ace35a3c06470d880bd99e077ffe0cf565f594b003dee5e0153accd010d03</cites><orcidid>0009-0008-0799-4526 ; 0000-0002-3188-2769</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,782,786,27931,27932,39265</link.rule.ids></links><search><creatorcontrib>Jankov, Dimitrije</creatorcontrib><creatorcontrib>Yuan, Binhang</creatorcontrib><creatorcontrib>Luo, Shangyu</creatorcontrib><creatorcontrib>Jermaine, Chris</creatorcontrib><title>Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees</title><title>Proceedings of the VLDB Endowment</title><addtitle>PROC VLDB ENDOW</addtitle><description>When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a two-phase execution strategy for numerical computations that are expressed relationally, as aggregated join trees (that is, expressed as a series of relational joins followed by an aggregation). In a pilot run, lineage information is collected; this lineage is used to optimally plan the computation at the level of individual records. Then, the computation is actually executed. We show experimentally that a relational system making use of this two-phase strategy can be an excellent platform for distributed ML computations.</description><subject>Computer Science</subject><subject>Computer Science, Information Systems</subject><subject>Computer Science, Theory &amp; Methods</subject><subject>Science &amp; Technology</subject><subject>Technology</subject><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>HGBXW</sourceid><recordid>eNqNkDtPwzAUhS0EEqWwM3pHges6jp2xCuWl8hjKHDn2TWvUxpWdUPj3pA8hRqZzpPMYPkIuGVyzVEp1w1MBuYLrnebsiAxGTECiIJfHf_wpOYvxAyBTGVMDsrh1sQ2u6lq09KVbYXBGL6luLH3WZuEapFPUoXHNnBZ-te5a3TrfRPrpNJ1tfPK20BHp5AtNtw2or-l4Pg8419vHJ-8aOguI8Zyc1HoZ8eKgQ_J-N5kVD8n09f6xGE8TM0p5mwhtpTbIheYGslSCVQoqm-cIUtY1gqlFJmqRpxUAt4gCgQmujbHAwAIfEtj_muBjDFiX6-BWOnyXDModqfJAqjyQ6idX-8kGK19H47Ax-DuDnpVQaS5579iob6v_twu351X4rmn5D05qfaM</recordid><startdate>20210301</startdate><enddate>20210301</enddate><creator>Jankov, Dimitrije</creator><creator>Yuan, Binhang</creator><creator>Luo, Shangyu</creator><creator>Jermaine, Chris</creator><general>Assoc Computing Machinery</general><scope>BLEPL</scope><scope>DTL</scope><scope>HGBXW</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0008-0799-4526</orcidid><orcidid>https://orcid.org/0000-0002-3188-2769</orcidid></search><sort><creationdate>20210301</creationdate><title>Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees</title><author>Jankov, Dimitrije ; Yuan, Binhang ; Luo, Shangyu ; Jermaine, Chris</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c243t-5ad7ace35a3c06470d880bd99e077ffe0cf565f594b003dee5e0153accd010d03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science</topic><topic>Computer Science, Information Systems</topic><topic>Computer Science, Theory &amp; Methods</topic><topic>Science &amp; Technology</topic><topic>Technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jankov, Dimitrije</creatorcontrib><creatorcontrib>Yuan, Binhang</creatorcontrib><creatorcontrib>Luo, Shangyu</creatorcontrib><creatorcontrib>Jermaine, Chris</creatorcontrib><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>Web of Science - Science Citation Index Expanded - 2021</collection><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jankov, Dimitrije</au><au>Yuan, Binhang</au><au>Luo, Shangyu</au><au>Jermaine, Chris</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><stitle>PROC VLDB ENDOW</stitle><date>2021-03-01</date><risdate>2021</risdate><volume>14</volume><issue>7</issue><spage>1228</spage><epage>1240</epage><pages>1228-1240</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a two-phase execution strategy for numerical computations that are expressed relationally, as aggregated join trees (that is, expressed as a series of relational joins followed by an aggregation). In a pilot run, lineage information is collected; this lineage is used to optimally plan the computation at the level of individual records. Then, the computation is actually executed. We show experimentally that a relational system making use of this two-phase strategy can be an excellent platform for distributed ML computations.</abstract><cop>NEW YORK</cop><pub>Assoc Computing Machinery</pub><doi>10.14778/3450980.3450991</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0008-0799-4526</orcidid><orcidid>https://orcid.org/0000-0002-3188-2769</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 2150-8097
ispartof Proceedings of the VLDB Endowment, 2021-03, Vol.14 (7), p.1228-1240
issn 2150-8097
2150-8097
language eng
recordid cdi_webofscience_primary_000658497300012
source Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />; ACM Digital Library
subjects Computer Science
Computer Science, Information Systems
Computer Science, Theory & Methods
Science & Technology
Technology
title Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-05T02%3A50%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-webofscience_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Distributed%20Numerical%20and%20Machine%20Learning%20Computations%20via%20Two-Phase%20Execution%20of%20Aggregated%20Join%20Trees&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Jankov,%20Dimitrije&rft.date=2021-03-01&rft.volume=14&rft.issue=7&rft.spage=1228&rft.epage=1240&rft.pages=1228-1240&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/3450980.3450991&rft_dat=%3Cwebofscience_cross%3E000658497300012%3C/webofscience_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true