Evaluating end-to-end optimization for data analytics applications in weld

Modern analytics applications use a diverse mix of libraries and functions. Unfortunately, there is no optimization across these libraries, resulting in performance penalties as high as an order of magnitude in many applications. To address this problem, we proposed Weld, a common runtime for existi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the VLDB Endowment 2018-05, Vol.11 (9), p.1002-1015
Hauptverfasser:	Palkar, Shoumik, Thomas, James, Narayanan, Deepak, Thaker, Pratiksha, Palamuttam, Rahul, Negi, Parimajan, Shanbhag, Anil, Schwarzkopf, Malte, Pirk, Holger, Amarasinghe, Saman, Madden, Samuel, Zaharia, Matei
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1015
container_issue	9
container_start_page	1002
container_title	Proceedings of the VLDB Endowment
container_volume	11
creator	Palkar, Shoumik Thomas, James Narayanan, Deepak Thaker, Pratiksha Palamuttam, Rahul Negi, Parimajan Shanbhag, Anil Schwarzkopf, Malte Pirk, Holger Amarasinghe, Saman Madden, Samuel Zaharia, Matei
description	Modern analytics applications use a diverse mix of libraries and functions. Unfortunately, there is no optimization across these libraries, resulting in performance penalties as high as an order of magnitude in many applications. To address this problem, we proposed Weld, a common runtime for existing data analytics libraries that performs key physical optimizations such as pipelining under existing, imperative library APIs. In this work, we further develop the Weld vision by designing an automatic adaptive optimizer for Weld applications, and evaluating its impact on realistic data science workloads. Our optimizer eliminates multiple forms of overhead that arise when composing imperative libraries like Pandas and NumPy, and uses lightweight measurements to make data-dependent decisions at run-time in ad-hoc workloads where no statistics are available, with sub-second overhead. We also evaluate which optimizations have the largest impact in practice and whether Weld can be integrated into libraries incrementally. Our results are promising: using our optimizer, Weld accelerates data science workloads by up to 23X on one thread and 80X on eight threads, and its adaptive optimizations provide up to a 3.75X speedup over rule-based optimization. Moreover, Weld provides benefits if even just 4--5 operators in a library are ported to use it. Our results show that common runtime designs like Weld may be a viable approach to accelerate analytics.
doi_str_mv	10.14778/3213880.3213890
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_3213880_3213890</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_3213880_3213890</sourcerecordid><originalsourceid>FETCH-LOGICAL-c215t-67bca1c58338fe362a791cef63f55d7f6fd4245a988b65791a846e882796eaa73</originalsourceid><addsrcrecordid>eNpNkFFLwzAUhYMoODfffcwfyEyaJrl9lDF1MvDFPZe7NJFIl5YmKvPXW2offPoO98Dh8hFyJ_halMbAvSyEBODriRW_IItCKM6AV-byX74mNyl9cK5BC1iQl-0Xtp-YQ3ynLjYsd2wE7focTuFnvHeR-m6gDWakGLE952ATxb5vg53qREOk365tVuTKY5vc7cwlOTxu3zbPbP_6tNs87Jkdn8hMm6NFYRVICd5JXaCphHVeS69UY7z2TVmUCiuAo1Zjh1BqB1CYSjtEI5eE_-3aoUtpcL7uh3DC4VwLXk8u6tlFPbuQv7iAUdQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Evaluating end-to-end optimization for data analytics applications in weld</title><source>ACM Digital Library Complete</source><creator>Palkar, Shoumik ; Thomas, James ; Narayanan, Deepak ; Thaker, Pratiksha ; Palamuttam, Rahul ; Negi, Parimajan ; Shanbhag, Anil ; Schwarzkopf, Malte ; Pirk, Holger ; Amarasinghe, Saman ; Madden, Samuel ; Zaharia, Matei</creator><creatorcontrib>Palkar, Shoumik ; Thomas, James ; Narayanan, Deepak ; Thaker, Pratiksha ; Palamuttam, Rahul ; Negi, Parimajan ; Shanbhag, Anil ; Schwarzkopf, Malte ; Pirk, Holger ; Amarasinghe, Saman ; Madden, Samuel ; Zaharia, Matei</creatorcontrib><description>Modern analytics applications use a diverse mix of libraries and functions. Unfortunately, there is no optimization across these libraries, resulting in performance penalties as high as an order of magnitude in many applications. To address this problem, we proposed Weld, a common runtime for existing data analytics libraries that performs key physical optimizations such as pipelining under existing, imperative library APIs. In this work, we further develop the Weld vision by designing an automatic adaptive optimizer for Weld applications, and evaluating its impact on realistic data science workloads. Our optimizer eliminates multiple forms of overhead that arise when composing imperative libraries like Pandas and NumPy, and uses lightweight measurements to make data-dependent decisions at run-time in ad-hoc workloads where no statistics are available, with sub-second overhead. We also evaluate which optimizations have the largest impact in practice and whether Weld can be integrated into libraries incrementally. Our results are promising: using our optimizer, Weld accelerates data science workloads by up to 23X on one thread and 80X on eight threads, and its adaptive optimizations provide up to a 3.75X speedup over rule-based optimization. Moreover, Weld provides benefits if even just 4--5 operators in a library are ported to use it. Our results show that common runtime designs like Weld may be a viable approach to accelerate analytics.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/3213880.3213890</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2018-05, Vol.11 (9), p.1002-1015</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c215t-67bca1c58338fe362a791cef63f55d7f6fd4245a988b65791a846e882796eaa73</citedby><cites>FETCH-LOGICAL-c215t-67bca1c58338fe362a791cef63f55d7f6fd4245a988b65791a846e882796eaa73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Palkar, Shoumik</creatorcontrib><creatorcontrib>Thomas, James</creatorcontrib><creatorcontrib>Narayanan, Deepak</creatorcontrib><creatorcontrib>Thaker, Pratiksha</creatorcontrib><creatorcontrib>Palamuttam, Rahul</creatorcontrib><creatorcontrib>Negi, Parimajan</creatorcontrib><creatorcontrib>Shanbhag, Anil</creatorcontrib><creatorcontrib>Schwarzkopf, Malte</creatorcontrib><creatorcontrib>Pirk, Holger</creatorcontrib><creatorcontrib>Amarasinghe, Saman</creatorcontrib><creatorcontrib>Madden, Samuel</creatorcontrib><creatorcontrib>Zaharia, Matei</creatorcontrib><title>Evaluating end-to-end optimization for data analytics applications in weld</title><title>Proceedings of the VLDB Endowment</title><description>Modern analytics applications use a diverse mix of libraries and functions. Unfortunately, there is no optimization across these libraries, resulting in performance penalties as high as an order of magnitude in many applications. To address this problem, we proposed Weld, a common runtime for existing data analytics libraries that performs key physical optimizations such as pipelining under existing, imperative library APIs. In this work, we further develop the Weld vision by designing an automatic adaptive optimizer for Weld applications, and evaluating its impact on realistic data science workloads. Our optimizer eliminates multiple forms of overhead that arise when composing imperative libraries like Pandas and NumPy, and uses lightweight measurements to make data-dependent decisions at run-time in ad-hoc workloads where no statistics are available, with sub-second overhead. We also evaluate which optimizations have the largest impact in practice and whether Weld can be integrated into libraries incrementally. Our results are promising: using our optimizer, Weld accelerates data science workloads by up to 23X on one thread and 80X on eight threads, and its adaptive optimizations provide up to a 3.75X speedup over rule-based optimization. Moreover, Weld provides benefits if even just 4--5 operators in a library are ported to use it. Our results show that common runtime designs like Weld may be a viable approach to accelerate analytics.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNpNkFFLwzAUhYMoODfffcwfyEyaJrl9lDF1MvDFPZe7NJFIl5YmKvPXW2offPoO98Dh8hFyJ_halMbAvSyEBODriRW_IItCKM6AV-byX74mNyl9cK5BC1iQl-0Xtp-YQ3ynLjYsd2wE7focTuFnvHeR-m6gDWakGLE952ATxb5vg53qREOk365tVuTKY5vc7cwlOTxu3zbPbP_6tNs87Jkdn8hMm6NFYRVICd5JXaCphHVeS69UY7z2TVmUCiuAo1Zjh1BqB1CYSjtEI5eE_-3aoUtpcL7uh3DC4VwLXk8u6tlFPbuQv7iAUdQ</recordid><startdate>201805</startdate><enddate>201805</enddate><creator>Palkar, Shoumik</creator><creator>Thomas, James</creator><creator>Narayanan, Deepak</creator><creator>Thaker, Pratiksha</creator><creator>Palamuttam, Rahul</creator><creator>Negi, Parimajan</creator><creator>Shanbhag, Anil</creator><creator>Schwarzkopf, Malte</creator><creator>Pirk, Holger</creator><creator>Amarasinghe, Saman</creator><creator>Madden, Samuel</creator><creator>Zaharia, Matei</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>201805</creationdate><title>Evaluating end-to-end optimization for data analytics applications in weld</title><author>Palkar, Shoumik ; Thomas, James ; Narayanan, Deepak ; Thaker, Pratiksha ; Palamuttam, Rahul ; Negi, Parimajan ; Shanbhag, Anil ; Schwarzkopf, Malte ; Pirk, Holger ; Amarasinghe, Saman ; Madden, Samuel ; Zaharia, Matei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c215t-67bca1c58338fe362a791cef63f55d7f6fd4245a988b65791a846e882796eaa73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Palkar, Shoumik</creatorcontrib><creatorcontrib>Thomas, James</creatorcontrib><creatorcontrib>Narayanan, Deepak</creatorcontrib><creatorcontrib>Thaker, Pratiksha</creatorcontrib><creatorcontrib>Palamuttam, Rahul</creatorcontrib><creatorcontrib>Negi, Parimajan</creatorcontrib><creatorcontrib>Shanbhag, Anil</creatorcontrib><creatorcontrib>Schwarzkopf, Malte</creatorcontrib><creatorcontrib>Pirk, Holger</creatorcontrib><creatorcontrib>Amarasinghe, Saman</creatorcontrib><creatorcontrib>Madden, Samuel</creatorcontrib><creatorcontrib>Zaharia, Matei</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Palkar, Shoumik</au><au>Thomas, James</au><au>Narayanan, Deepak</au><au>Thaker, Pratiksha</au><au>Palamuttam, Rahul</au><au>Negi, Parimajan</au><au>Shanbhag, Anil</au><au>Schwarzkopf, Malte</au><au>Pirk, Holger</au><au>Amarasinghe, Saman</au><au>Madden, Samuel</au><au>Zaharia, Matei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating end-to-end optimization for data analytics applications in weld</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2018-05</date><risdate>2018</risdate><volume>11</volume><issue>9</issue><spage>1002</spage><epage>1015</epage><pages>1002-1015</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>Modern analytics applications use a diverse mix of libraries and functions. Unfortunately, there is no optimization across these libraries, resulting in performance penalties as high as an order of magnitude in many applications. To address this problem, we proposed Weld, a common runtime for existing data analytics libraries that performs key physical optimizations such as pipelining under existing, imperative library APIs. In this work, we further develop the Weld vision by designing an automatic adaptive optimizer for Weld applications, and evaluating its impact on realistic data science workloads. Our optimizer eliminates multiple forms of overhead that arise when composing imperative libraries like Pandas and NumPy, and uses lightweight measurements to make data-dependent decisions at run-time in ad-hoc workloads where no statistics are available, with sub-second overhead. We also evaluate which optimizations have the largest impact in practice and whether Weld can be integrated into libraries incrementally. Our results are promising: using our optimizer, Weld accelerates data science workloads by up to 23X on one thread and 80X on eight threads, and its adaptive optimizations provide up to a 3.75X speedup over rule-based optimization. Moreover, Weld provides benefits if even just 4--5 operators in a library are ported to use it. Our results show that common runtime designs like Weld may be a viable approach to accelerate analytics.</abstract><doi>10.14778/3213880.3213890</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2150-8097
ispartof	Proceedings of the VLDB Endowment, 2018-05, Vol.11 (9), p.1002-1015
issn	2150-8097 2150-8097
language	eng
recordid	cdi_crossref_primary_10_14778_3213880_3213890
source	ACM Digital Library Complete
title	Evaluating end-to-end optimization for data analytics applications in weld
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T17%3A51%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20end-to-end%20optimization%20for%20data%20analytics%20applications%20in%20weld&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Palkar,%20Shoumik&rft.date=2018-05&rft.volume=11&rft.issue=9&rft.spage=1002&rft.epage=1015&rft.pages=1002-1015&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/3213880.3213890&rft_dat=%3Ccrossref%3E10_14778_3213880_3213890%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true