Peeking into the optimization of data flow programs with MapReduce-style UDFs
Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict t...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1295 |
---|---|
container_issue | |
container_start_page | 1292 |
container_title | |
container_volume | |
creator | Hueske, F. Peters, M. Krettek, A. Ringwald, M. Tzoumas, K. Markl, V. Freytag, J. |
description | Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach. |
doi_str_mv | 10.1109/ICDE.2013.6544927 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6544927</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6544927</ieee_id><sourcerecordid>6544927</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-148c7ed739948803b656b0b9c8555863fb624e2ce1260c48fe78c3a68148a43a3</originalsourceid><addsrcrecordid>eNpNkMtKw0AYhccbGGofQNzMCyTOzD_XpfSihRZFLLgrk_RPO5o0IZlS6tNbsAvP5izOx4FzCLnnLOOcucfZaDzJBOOQaSWlE-aCDJ2xXGoD0p2IS5IIMCplQn9e_c-Yg2uScKYh1WDFLRn2_Rc7yUnOFUvI4g3xO-w2NOxiQ-MWadPGUIcfH0Ozo01J1z56WlbNgbZds-l83dNDiFu68O07rvcFpn08VkiX42l_R25KX_U4PPuALKeTj9FLOn99no2e5mngRsWUS1sYXBtwTlrLINdK5yx3hVVKWQ1lroVEUSAXmhXSlmhsAV6fVlkvwcOAPPz1BkRctV2ofXdcnc-BX0YwUo4</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Peeking into the optimization of data flow programs with MapReduce-style UDFs</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Hueske, F. ; Peters, M. ; Krettek, A. ; Ringwald, M. ; Tzoumas, K. ; Markl, V. ; Freytag, J.</creator><creatorcontrib>Hueske, F. ; Peters, M. ; Krettek, A. ; Ringwald, M. ; Tzoumas, K. ; Markl, V. ; Freytag, J.</creatorcontrib><description>Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.</description><identifier>ISSN: 1063-6382</identifier><identifier>ISBN: 9781467349093</identifier><identifier>ISBN: 1467349097</identifier><identifier>EISSN: 2375-026X</identifier><identifier>EISBN: 9781467349109</identifier><identifier>EISBN: 1467349089</identifier><identifier>EISBN: 1467349100</identifier><identifier>EISBN: 9781467349086</identifier><identifier>DOI: 10.1109/ICDE.2013.6544927</identifier><language>eng</language><publisher>IEEE</publisher><subject>Data mining ; Data processing ; Data visualization ; Optimization ; Programming ; Query processing ; Runtime</subject><ispartof>2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013, p.1292-1295</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6544927$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,27924,54919</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6544927$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hueske, F.</creatorcontrib><creatorcontrib>Peters, M.</creatorcontrib><creatorcontrib>Krettek, A.</creatorcontrib><creatorcontrib>Ringwald, M.</creatorcontrib><creatorcontrib>Tzoumas, K.</creatorcontrib><creatorcontrib>Markl, V.</creatorcontrib><creatorcontrib>Freytag, J.</creatorcontrib><title>Peeking into the optimization of data flow programs with MapReduce-style UDFs</title><title>2013 IEEE 29th International Conference on Data Engineering (ICDE)</title><addtitle>ICDE</addtitle><description>Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.</description><subject>Data mining</subject><subject>Data processing</subject><subject>Data visualization</subject><subject>Optimization</subject><subject>Programming</subject><subject>Query processing</subject><subject>Runtime</subject><issn>1063-6382</issn><issn>2375-026X</issn><isbn>9781467349093</isbn><isbn>1467349097</isbn><isbn>9781467349109</isbn><isbn>1467349089</isbn><isbn>1467349100</isbn><isbn>9781467349086</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2013</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpNkMtKw0AYhccbGGofQNzMCyTOzD_XpfSihRZFLLgrk_RPO5o0IZlS6tNbsAvP5izOx4FzCLnnLOOcucfZaDzJBOOQaSWlE-aCDJ2xXGoD0p2IS5IIMCplQn9e_c-Yg2uScKYh1WDFLRn2_Rc7yUnOFUvI4g3xO-w2NOxiQ-MWadPGUIcfH0Ozo01J1z56WlbNgbZds-l83dNDiFu68O07rvcFpn08VkiX42l_R25KX_U4PPuALKeTj9FLOn99no2e5mngRsWUS1sYXBtwTlrLINdK5yx3hVVKWQ1lroVEUSAXmhXSlmhsAV6fVlkvwcOAPPz1BkRctV2ofXdcnc-BX0YwUo4</recordid><startdate>201304</startdate><enddate>201304</enddate><creator>Hueske, F.</creator><creator>Peters, M.</creator><creator>Krettek, A.</creator><creator>Ringwald, M.</creator><creator>Tzoumas, K.</creator><creator>Markl, V.</creator><creator>Freytag, J.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201304</creationdate><title>Peeking into the optimization of data flow programs with MapReduce-style UDFs</title><author>Hueske, F. ; Peters, M. ; Krettek, A. ; Ringwald, M. ; Tzoumas, K. ; Markl, V. ; Freytag, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-148c7ed739948803b656b0b9c8555863fb624e2ce1260c48fe78c3a68148a43a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Data mining</topic><topic>Data processing</topic><topic>Data visualization</topic><topic>Optimization</topic><topic>Programming</topic><topic>Query processing</topic><topic>Runtime</topic><toplevel>online_resources</toplevel><creatorcontrib>Hueske, F.</creatorcontrib><creatorcontrib>Peters, M.</creatorcontrib><creatorcontrib>Krettek, A.</creatorcontrib><creatorcontrib>Ringwald, M.</creatorcontrib><creatorcontrib>Tzoumas, K.</creatorcontrib><creatorcontrib>Markl, V.</creatorcontrib><creatorcontrib>Freytag, J.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hueske, F.</au><au>Peters, M.</au><au>Krettek, A.</au><au>Ringwald, M.</au><au>Tzoumas, K.</au><au>Markl, V.</au><au>Freytag, J.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Peeking into the optimization of data flow programs with MapReduce-style UDFs</atitle><btitle>2013 IEEE 29th International Conference on Data Engineering (ICDE)</btitle><stitle>ICDE</stitle><date>2013-04</date><risdate>2013</risdate><spage>1292</spage><epage>1295</epage><pages>1292-1295</pages><issn>1063-6382</issn><eissn>2375-026X</eissn><isbn>9781467349093</isbn><isbn>1467349097</isbn><eisbn>9781467349109</eisbn><eisbn>1467349089</eisbn><eisbn>1467349100</eisbn><eisbn>9781467349086</eisbn><abstract>Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.</abstract><pub>IEEE</pub><doi>10.1109/ICDE.2013.6544927</doi><tpages>4</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1063-6382 |
ispartof | 2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013, p.1292-1295 |
issn | 1063-6382 2375-026X |
language | eng |
recordid | cdi_ieee_primary_6544927 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Data mining Data processing Data visualization Optimization Programming Query processing Runtime |
title | Peeking into the optimization of data flow programs with MapReduce-style UDFs |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T00%3A07%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Peeking%20into%20the%20optimization%20of%20data%20flow%20programs%20with%20MapReduce-style%20UDFs&rft.btitle=2013%20IEEE%2029th%20International%20Conference%20on%20Data%20Engineering%20(ICDE)&rft.au=Hueske,%20F.&rft.date=2013-04&rft.spage=1292&rft.epage=1295&rft.pages=1292-1295&rft.issn=1063-6382&rft.eissn=2375-026X&rft.isbn=9781467349093&rft.isbn_list=1467349097&rft_id=info:doi/10.1109/ICDE.2013.6544927&rft_dat=%3Cieee_6IE%3E6544927%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781467349109&rft.eisbn_list=1467349089&rft.eisbn_list=1467349100&rft.eisbn_list=9781467349086&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6544927&rfr_iscdi=true |