Peeking into the optimization of data flow programs with MapReduce-style UDFs

Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Hueske, F., Peters, M., Krettek, A., Ringwald, M., Tzoumas, K., Markl, V., Freytag, J.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1295
container_issue
container_start_page 1292
container_title
container_volume
creator Hueske, F.
Peters, M.
Krettek, A.
Ringwald, M.
Tzoumas, K.
Markl, V.
Freytag, J.
description Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.
doi_str_mv 10.1109/ICDE.2013.6544927
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6544927</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6544927</ieee_id><sourcerecordid>6544927</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-148c7ed739948803b656b0b9c8555863fb624e2ce1260c48fe78c3a68148a43a3</originalsourceid><addsrcrecordid>eNpNkMtKw0AYhccbGGofQNzMCyTOzD_XpfSihRZFLLgrk_RPO5o0IZlS6tNbsAvP5izOx4FzCLnnLOOcucfZaDzJBOOQaSWlE-aCDJ2xXGoD0p2IS5IIMCplQn9e_c-Yg2uScKYh1WDFLRn2_Rc7yUnOFUvI4g3xO-w2NOxiQ-MWadPGUIcfH0Ozo01J1z56WlbNgbZds-l83dNDiFu68O07rvcFpn08VkiX42l_R25KX_U4PPuALKeTj9FLOn99no2e5mngRsWUS1sYXBtwTlrLINdK5yx3hVVKWQ1lroVEUSAXmhXSlmhsAV6fVlkvwcOAPPz1BkRctV2ofXdcnc-BX0YwUo4</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Peeking into the optimization of data flow programs with MapReduce-style UDFs</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Hueske, F. ; Peters, M. ; Krettek, A. ; Ringwald, M. ; Tzoumas, K. ; Markl, V. ; Freytag, J.</creator><creatorcontrib>Hueske, F. ; Peters, M. ; Krettek, A. ; Ringwald, M. ; Tzoumas, K. ; Markl, V. ; Freytag, J.</creatorcontrib><description>Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.</description><identifier>ISSN: 1063-6382</identifier><identifier>ISBN: 9781467349093</identifier><identifier>ISBN: 1467349097</identifier><identifier>EISSN: 2375-026X</identifier><identifier>EISBN: 9781467349109</identifier><identifier>EISBN: 1467349089</identifier><identifier>EISBN: 1467349100</identifier><identifier>EISBN: 9781467349086</identifier><identifier>DOI: 10.1109/ICDE.2013.6544927</identifier><language>eng</language><publisher>IEEE</publisher><subject>Data mining ; Data processing ; Data visualization ; Optimization ; Programming ; Query processing ; Runtime</subject><ispartof>2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013, p.1292-1295</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6544927$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,27924,54919</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6544927$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hueske, F.</creatorcontrib><creatorcontrib>Peters, M.</creatorcontrib><creatorcontrib>Krettek, A.</creatorcontrib><creatorcontrib>Ringwald, M.</creatorcontrib><creatorcontrib>Tzoumas, K.</creatorcontrib><creatorcontrib>Markl, V.</creatorcontrib><creatorcontrib>Freytag, J.</creatorcontrib><title>Peeking into the optimization of data flow programs with MapReduce-style UDFs</title><title>2013 IEEE 29th International Conference on Data Engineering (ICDE)</title><addtitle>ICDE</addtitle><description>Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.</description><subject>Data mining</subject><subject>Data processing</subject><subject>Data visualization</subject><subject>Optimization</subject><subject>Programming</subject><subject>Query processing</subject><subject>Runtime</subject><issn>1063-6382</issn><issn>2375-026X</issn><isbn>9781467349093</isbn><isbn>1467349097</isbn><isbn>9781467349109</isbn><isbn>1467349089</isbn><isbn>1467349100</isbn><isbn>9781467349086</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2013</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpNkMtKw0AYhccbGGofQNzMCyTOzD_XpfSihRZFLLgrk_RPO5o0IZlS6tNbsAvP5izOx4FzCLnnLOOcucfZaDzJBOOQaSWlE-aCDJ2xXGoD0p2IS5IIMCplQn9e_c-Yg2uScKYh1WDFLRn2_Rc7yUnOFUvI4g3xO-w2NOxiQ-MWadPGUIcfH0Ozo01J1z56WlbNgbZds-l83dNDiFu68O07rvcFpn08VkiX42l_R25KX_U4PPuALKeTj9FLOn99no2e5mngRsWUS1sYXBtwTlrLINdK5yx3hVVKWQ1lroVEUSAXmhXSlmhsAV6fVlkvwcOAPPz1BkRctV2ofXdcnc-BX0YwUo4</recordid><startdate>201304</startdate><enddate>201304</enddate><creator>Hueske, F.</creator><creator>Peters, M.</creator><creator>Krettek, A.</creator><creator>Ringwald, M.</creator><creator>Tzoumas, K.</creator><creator>Markl, V.</creator><creator>Freytag, J.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201304</creationdate><title>Peeking into the optimization of data flow programs with MapReduce-style UDFs</title><author>Hueske, F. ; Peters, M. ; Krettek, A. ; Ringwald, M. ; Tzoumas, K. ; Markl, V. ; Freytag, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-148c7ed739948803b656b0b9c8555863fb624e2ce1260c48fe78c3a68148a43a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Data mining</topic><topic>Data processing</topic><topic>Data visualization</topic><topic>Optimization</topic><topic>Programming</topic><topic>Query processing</topic><topic>Runtime</topic><toplevel>online_resources</toplevel><creatorcontrib>Hueske, F.</creatorcontrib><creatorcontrib>Peters, M.</creatorcontrib><creatorcontrib>Krettek, A.</creatorcontrib><creatorcontrib>Ringwald, M.</creatorcontrib><creatorcontrib>Tzoumas, K.</creatorcontrib><creatorcontrib>Markl, V.</creatorcontrib><creatorcontrib>Freytag, J.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hueske, F.</au><au>Peters, M.</au><au>Krettek, A.</au><au>Ringwald, M.</au><au>Tzoumas, K.</au><au>Markl, V.</au><au>Freytag, J.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Peeking into the optimization of data flow programs with MapReduce-style UDFs</atitle><btitle>2013 IEEE 29th International Conference on Data Engineering (ICDE)</btitle><stitle>ICDE</stitle><date>2013-04</date><risdate>2013</risdate><spage>1292</spage><epage>1295</epage><pages>1292-1295</pages><issn>1063-6382</issn><eissn>2375-026X</eissn><isbn>9781467349093</isbn><isbn>1467349097</isbn><eisbn>9781467349109</eisbn><eisbn>1467349089</eisbn><eisbn>1467349100</eisbn><eisbn>9781467349086</eisbn><abstract>Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.</abstract><pub>IEEE</pub><doi>10.1109/ICDE.2013.6544927</doi><tpages>4</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1063-6382
ispartof 2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013, p.1292-1295
issn 1063-6382
2375-026X
language eng
recordid cdi_ieee_primary_6544927
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Data mining
Data processing
Data visualization
Optimization
Programming
Query processing
Runtime
title Peeking into the optimization of data flow programs with MapReduce-style UDFs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T00%3A07%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Peeking%20into%20the%20optimization%20of%20data%20flow%20programs%20with%20MapReduce-style%20UDFs&rft.btitle=2013%20IEEE%2029th%20International%20Conference%20on%20Data%20Engineering%20(ICDE)&rft.au=Hueske,%20F.&rft.date=2013-04&rft.spage=1292&rft.epage=1295&rft.pages=1292-1295&rft.issn=1063-6382&rft.eissn=2375-026X&rft.isbn=9781467349093&rft.isbn_list=1467349097&rft_id=info:doi/10.1109/ICDE.2013.6544927&rft_dat=%3Cieee_6IE%3E6544927%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781467349109&rft.eisbn_list=1467349089&rft.eisbn_list=1467349100&rft.eisbn_list=9781467349086&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6544927&rfr_iscdi=true