Peeking into the optimization of data flow programs with MapReduce-style UDFs

Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Hueske, F., Peters, M., Krettek, A., Ringwald, M., Tzoumas, K., Markl, V., Freytag, J.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Data mining Data processing Data visualization Optimization Programming Query processing Runtime
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1295
container_issue
container_start_page	1292
container_title
container_volume
creator	Hueske, F. Peters, M. Krettek, A. Ringwald, M. Tzoumas, K. Markl, V. Freytag, J.
description	Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.
doi_str_mv	10.1109/ICDE.2013.6544927
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6544927</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6544927</ieee_id><sourcerecordid>6544927</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-148c7ed739948803b656b0b9c8555863fb624e2ce1260c48fe78c3a68148a43a3</originalsourceid><addsrcrecordid>eNpNkMtKw0AYhccbGGofQNzMCyTOzD_XpfSihRZFLLgrk_RPO5o0IZlS6tNbsAvP5izOx4FzCLnnLOOcucfZaDzJBOOQaSWlE-aCDJ2xXGoD0p2IS5IIMCplQn9e_c-Yg2uScKYh1WDFLRn2_Rc7yUnOFUvI4g3xO-w2NOxiQ-MWadPGUIcfH0Ozo01J1z56WlbNgbZds-l83dNDiFu68O07rvcFpn08VkiX42l_R25KX_U4PPuALKeTj9FLOn99no2e5mngRsWUS1sYXBtwTlrLINdK5yx3hVVKWQ1lroVEUSAXmhXSlmhsAV6fVlkvwcOAPPz1BkRctV2ofXdcnc-BX0YwUo4</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Peeking into the optimization of data flow programs with MapReduce-style UDFs</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Hueske, F. ; Peters, M. ; Krettek, A. ; Ringwald, M. ; Tzoumas, K. ; Markl, V. ; Freytag, J.</creator><creatorcontrib>Hueske, F. ; Peters, M. ; Krettek, A. ; Ringwald, M. ; Tzoumas, K. ; Markl, V. ; Freytag, J.</creatorcontrib><description>Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.</description><identifier>ISSN: 1063-6382</identifier><identifier>ISBN: 9781467349093</identifier><identifier>ISBN: 1467349097</identifier><identifier>EISSN: 2375-026X</identifier><identifier>EISBN: 9781467349109</identifier><identifier>EISBN: 1467349089</identifier><identifier>EISBN: 1467349100</identifier><identifier>EISBN: 9781467349086</identifier><identifier>DOI: 10.1109/ICDE.2013.6544927</identifier><language>eng</language><publisher>IEEE</publisher><subject>Data mining ; Data processing ; Data visualization ; Optimization ; Programming ; Query processing ; Runtime</subject><ispartof>2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013, p.1292-1295</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6544927$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,27924,54919</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6544927$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hueske, F.</creatorcontrib><creatorcontrib>Peters, M.</creatorcontrib><creatorcontrib>Krettek, A.</creatorcontrib><creatorcontrib>Ringwald, M.</creatorcontrib><creatorcontrib>Tzoumas, K.</creatorcontrib><creatorcontrib>Markl, V.</creatorcontrib><creatorcontrib>Freytag, J.</creatorcontrib><title>Peeking into the optimization of data flow programs with MapReduce-style UDFs</title><title>2013 IEEE 29th International Conference on Data Engineering (ICDE)</title><addtitle>ICDE</addtitle><description>Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.</description><subject>Data mining</subject><subject>Data processing</subject><subject>Data visualization</subject><subject>Optimization</subject><subject>Programming</subject><subject>Query processing</subject><subject>Runtime</subject><issn>1063-6382</issn><issn>2375-026X</issn><isbn>9781467349093</isbn><isbn>1467349097</isbn><isbn>9781467349109</isbn><isbn>1467349089</isbn><isbn>1467349100</isbn><isbn>9781467349086</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2013</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpNkMtKw0AYhccbGGofQNzMCyTOzD_XpfSihRZFLLgrk_RPO5o0IZlS6tNbsAvP5izOx4FzCLnnLOOcucfZaDzJBOOQaSWlE-aCDJ2xXGoD0p2IS5IIMCplQn9e_c-Yg2uScKYh1WDFLRn2_Rc7yUnOFUvI4g3xO-w2NOxiQ-MWadPGUIcfH0Ozo01J1z56WlbNgbZds-l83dNDiFu68O07rvcFpn08VkiX42l_R25KX_U4PPuALKeTj9FLOn99no2e5mngRsWUS1sYXBtwTlrLINdK5yx3hVVKWQ1lroVEUSAXmhXSlmhsAV6fVlkvwcOAPPz1BkRctV2ofXdcnc-BX0YwUo4</recordid><startdate>201304</startdate><enddate>201304</enddate><creator>Hueske, F.</creator><creator>Peters, M.</creator><creator>Krettek, A.</creator><creator>Ringwald, M.</creator><creator>Tzoumas, K.</creator><creator>Markl, V.</creator><creator>Freytag, J.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201304</creationdate><title>Peeking into the optimization of data flow programs with MapReduce-style UDFs</title><author>Hueske, F. ; Peters, M. ; Krettek, A. ; Ringwald, M. ; Tzoumas, K. ; Markl, V. ; Freytag, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-148c7ed739948803b656b0b9c8555863fb624e2ce1260c48fe78c3a68148a43a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Data mining</topic><topic>Data processing</topic><topic>Data visualization</topic><topic>Optimization</topic><topic>Programming</topic><topic>Query processing</topic><topic>Runtime</topic><toplevel>online_resources</toplevel><creatorcontrib>Hueske, F.</creatorcontrib><creatorcontrib>Peters, M.</creatorcontrib><creatorcontrib>Krettek, A.</creatorcontrib><creatorcontrib>Ringwald, M.</creatorcontrib><creatorcontrib>Tzoumas, K.</creatorcontrib><creatorcontrib>Markl, V.</creatorcontrib><creatorcontrib>Freytag, J.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hueske, F.</au><au>Peters, M.</au><au>Krettek, A.</au><au>Ringwald, M.</au><au>Tzoumas, K.</au><au>Markl, V.</au><au>Freytag, J.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Peeking into the optimization of data flow programs with MapReduce-style UDFs</atitle><btitle>2013 IEEE 29th International Conference on Data Engineering (ICDE)</btitle><stitle>ICDE</stitle><date>2013-04</date><risdate>2013</risdate><spage>1292</spage><epage>1295</epage><pages>1292-1295</pages><issn>1063-6382</issn><eissn>2375-026X</eissn><isbn>9781467349093</isbn><isbn>1467349097</isbn><eisbn>9781467349109</eisbn><eisbn>1467349089</eisbn><eisbn>1467349100</eisbn><eisbn>9781467349086</eisbn><abstract>Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.</abstract><pub>IEEE</pub><doi>10.1109/ICDE.2013.6544927</doi><tpages>4</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1063-6382
ispartof	2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013, p.1292-1295
issn	1063-6382 2375-026X
language	eng
recordid	cdi_ieee_primary_6544927
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Data mining Data processing Data visualization Optimization Programming Query processing Runtime
title	Peeking into the optimization of data flow programs with MapReduce-style UDFs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T00%3A07%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Peeking%20into%20the%20optimization%20of%20data%20flow%20programs%20with%20MapReduce-style%20UDFs&rft.btitle=2013%20IEEE%2029th%20International%20Conference%20on%20Data%20Engineering%20(ICDE)&rft.au=Hueske,%20F.&rft.date=2013-04&rft.spage=1292&rft.epage=1295&rft.pages=1292-1295&rft.issn=1063-6382&rft.eissn=2375-026X&rft.isbn=9781467349093&rft.isbn_list=1467349097&rft_id=info:doi/10.1109/ICDE.2013.6544927&rft_dat=%3Cieee_6IE%3E6544927%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781467349109&rft.eisbn_list=1467349089&rft.eisbn_list=1467349100&rft.eisbn_list=9781467349086&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6544927&rfr_iscdi=true