ATCS: Auto-Tuning Configurations of Big Data Frameworks Based on Generative Adversarial Nets

Big data processing frameworks (e.g., Spark, Storm) have been extensively used for massive data processing in the industry. To improve the performance and robustness of these frameworks, developers provide users with highly-configurable parameters. Due to the high-dimensional parameter space and com...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020-01, Vol.8, p.1-1
Hauptverfasser: Li, Mingyu, Liu, Zhiqiang, Shi, Xuanhua, Jin, Hai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue
container_start_page 1
container_title IEEE access
container_volume 8
creator Li, Mingyu
Liu, Zhiqiang
Shi, Xuanhua
Jin, Hai
description Big data processing frameworks (e.g., Spark, Storm) have been extensively used for massive data processing in the industry. To improve the performance and robustness of these frameworks, developers provide users with highly-configurable parameters. Due to the high-dimensional parameter space and complicated interactions of parameters, manual tuning of parameters is time-consuming and ineffective. Building performance-predicting models for big data frameworks is challenging for several reasons: (1) the significant time required to collect training data and (2) the poor accuracy of the prediction model when training data are limited. To meet this challenge, we proposes an auto-tuning configuration parameters system (ATCS), a new auto-tuning approach based on Generative Adversarial Nets (GAN). ATCS can build a performance prediction model with less training data and without sacrificing model accuracy. Moreover, an optimized Genetic Algorithm (GA) is used in ATCS to explore the parameter space for optimum solutions. To prove the effectiveness of ATCS, we select five frequently-used workloads in Spark, each of which runs on five different sized data sets. The results demonstrate that ATCS improves the performance of five frequently-used Spark workloads compared to the default configurations. We achieved a performance increase of 3.5× on average, with a maximum of 6.9×. To obtain similar model accuracy, experiment results also demonstrate that the quantity of ATCS training data is only 6% of Deep Neural Network (DNN) data, 13% of Support Vector Machine (SVM) data, 18% of Decision Tree (DT) data. Moreover, compared to other machine learning models, the average performance increase of ATCS is 1.7× that of DNN, 1.6× that of SVM, 1.7× that of DT on the five typical Spark programs.
doi_str_mv 10.1109/ACCESS.2020.2979812
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2020_2979812</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9031432</ieee_id><doaj_id>oai_doaj_org_article_47ccd89dd1464569923b95031def32d2</doaj_id><sourcerecordid>2454875079</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-7165d2c4c8756c4e2dfa09cfd2ff76294df005e14805488947db2abea834690f3</originalsourceid><addsrcrecordid>eNpNUctKJDEULURhxPEL3ARcV5tnJXFX1vgC0UW3u4GQTm6atG3FSaqU-fuptkTmbu7lcB4XTlWdEbwgBOuLtuuul8sFxRQvqJZaEXpQHVPS6JoJ1hz-d_-oTkvZ4mnUBAl5XP1uV93yErXjkOrV2Md-g7rUh7gZsx1i6gtKAV3FDfplB4tusn2Fj5RfCrqyBTxKPbqFHvbcd0Ctf4dcbI52hx5hKD-ro2B3BU6_9kn1fHO96u7qh6fb-659qB3HaqglaYSnjjslReM4UB8s1i54GoJsqOY-YCyAcIUFV0pz6dfUrsEqxhuNAzup7mdfn-zWvOX4avNfk2w0n0DKG2PzEN0ODJfOeaW9J7zhotGasrUWmBEPgVFPJ6_z2estpz8jlMFs05j76X1D-RQvBZZ6YrGZ5XIqJUP4TiXY7Fsxcytm34r5amVSnc2qCADfCj2lc0bZP9uxhrs</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2454875079</pqid></control><display><type>article</type><title>ATCS: Auto-Tuning Configurations of Big Data Frameworks Based on Generative Adversarial Nets</title><source>IEEE Open Access Journals</source><source>TestCollectionTL3OpenAccess</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Li, Mingyu ; Liu, Zhiqiang ; Shi, Xuanhua ; Jin, Hai</creator><creatorcontrib>Li, Mingyu ; Liu, Zhiqiang ; Shi, Xuanhua ; Jin, Hai</creatorcontrib><description>Big data processing frameworks (e.g., Spark, Storm) have been extensively used for massive data processing in the industry. To improve the performance and robustness of these frameworks, developers provide users with highly-configurable parameters. Due to the high-dimensional parameter space and complicated interactions of parameters, manual tuning of parameters is time-consuming and ineffective. Building performance-predicting models for big data frameworks is challenging for several reasons: (1) the significant time required to collect training data and (2) the poor accuracy of the prediction model when training data are limited. To meet this challenge, we proposes an auto-tuning configuration parameters system (ATCS), a new auto-tuning approach based on Generative Adversarial Nets (GAN). ATCS can build a performance prediction model with less training data and without sacrificing model accuracy. Moreover, an optimized Genetic Algorithm (GA) is used in ATCS to explore the parameter space for optimum solutions. To prove the effectiveness of ATCS, we select five frequently-used workloads in Spark, each of which runs on five different sized data sets. The results demonstrate that ATCS improves the performance of five frequently-used Spark workloads compared to the default configurations. We achieved a performance increase of 3.5× on average, with a maximum of 6.9×. To obtain similar model accuracy, experiment results also demonstrate that the quantity of ATCS training data is only 6% of Deep Neural Network (DNN) data, 13% of Support Vector Machine (SVM) data, 18% of Decision Tree (DT) data. Moreover, compared to other machine learning models, the average performance increase of ATCS is 1.7× that of DNN, 1.6× that of SVM, 1.7× that of DT on the five typical Spark programs.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2020.2979812</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Artificial neural networks ; Automatic Tune Parameters ; Big Data ; Configuration management ; Data processing ; Decision trees ; Generative Adversarial Nets ; Genetic Algorithm ; Genetic algorithms ; Machine learning ; Mathematical models ; Model accuracy ; Performance enhancement ; Performance prediction ; Prediction models ; Spark ; Support vector machines ; Training ; Tuning ; Workload ; Workloads</subject><ispartof>IEEE access, 2020-01, Vol.8, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-7165d2c4c8756c4e2dfa09cfd2ff76294df005e14805488947db2abea834690f3</citedby><cites>FETCH-LOGICAL-c408t-7165d2c4c8756c4e2dfa09cfd2ff76294df005e14805488947db2abea834690f3</cites><orcidid>0000-0002-3934-7605 ; 0000-0003-0541-5515 ; 0000-0001-8451-8656 ; 0000-0002-7878-1658</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9031432$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2095,27612,27903,27904,54911</link.rule.ids></links><search><creatorcontrib>Li, Mingyu</creatorcontrib><creatorcontrib>Liu, Zhiqiang</creatorcontrib><creatorcontrib>Shi, Xuanhua</creatorcontrib><creatorcontrib>Jin, Hai</creatorcontrib><title>ATCS: Auto-Tuning Configurations of Big Data Frameworks Based on Generative Adversarial Nets</title><title>IEEE access</title><addtitle>Access</addtitle><description>Big data processing frameworks (e.g., Spark, Storm) have been extensively used for massive data processing in the industry. To improve the performance and robustness of these frameworks, developers provide users with highly-configurable parameters. Due to the high-dimensional parameter space and complicated interactions of parameters, manual tuning of parameters is time-consuming and ineffective. Building performance-predicting models for big data frameworks is challenging for several reasons: (1) the significant time required to collect training data and (2) the poor accuracy of the prediction model when training data are limited. To meet this challenge, we proposes an auto-tuning configuration parameters system (ATCS), a new auto-tuning approach based on Generative Adversarial Nets (GAN). ATCS can build a performance prediction model with less training data and without sacrificing model accuracy. Moreover, an optimized Genetic Algorithm (GA) is used in ATCS to explore the parameter space for optimum solutions. To prove the effectiveness of ATCS, we select five frequently-used workloads in Spark, each of which runs on five different sized data sets. The results demonstrate that ATCS improves the performance of five frequently-used Spark workloads compared to the default configurations. We achieved a performance increase of 3.5× on average, with a maximum of 6.9×. To obtain similar model accuracy, experiment results also demonstrate that the quantity of ATCS training data is only 6% of Deep Neural Network (DNN) data, 13% of Support Vector Machine (SVM) data, 18% of Decision Tree (DT) data. Moreover, compared to other machine learning models, the average performance increase of ATCS is 1.7× that of DNN, 1.6× that of SVM, 1.7× that of DT on the five typical Spark programs.</description><subject>Accuracy</subject><subject>Artificial neural networks</subject><subject>Automatic Tune Parameters</subject><subject>Big Data</subject><subject>Configuration management</subject><subject>Data processing</subject><subject>Decision trees</subject><subject>Generative Adversarial Nets</subject><subject>Genetic Algorithm</subject><subject>Genetic algorithms</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>Model accuracy</subject><subject>Performance enhancement</subject><subject>Performance prediction</subject><subject>Prediction models</subject><subject>Spark</subject><subject>Support vector machines</subject><subject>Training</subject><subject>Tuning</subject><subject>Workload</subject><subject>Workloads</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUctKJDEULURhxPEL3ARcV5tnJXFX1vgC0UW3u4GQTm6atG3FSaqU-fuptkTmbu7lcB4XTlWdEbwgBOuLtuuul8sFxRQvqJZaEXpQHVPS6JoJ1hz-d_-oTkvZ4mnUBAl5XP1uV93yErXjkOrV2Md-g7rUh7gZsx1i6gtKAV3FDfplB4tusn2Fj5RfCrqyBTxKPbqFHvbcd0Ctf4dcbI52hx5hKD-ro2B3BU6_9kn1fHO96u7qh6fb-659qB3HaqglaYSnjjslReM4UB8s1i54GoJsqOY-YCyAcIUFV0pz6dfUrsEqxhuNAzup7mdfn-zWvOX4avNfk2w0n0DKG2PzEN0ODJfOeaW9J7zhotGasrUWmBEPgVFPJ6_z2estpz8jlMFs05j76X1D-RQvBZZ6YrGZ5XIqJUP4TiXY7Fsxcytm34r5amVSnc2qCADfCj2lc0bZP9uxhrs</recordid><startdate>20200101</startdate><enddate>20200101</enddate><creator>Li, Mingyu</creator><creator>Liu, Zhiqiang</creator><creator>Shi, Xuanhua</creator><creator>Jin, Hai</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-3934-7605</orcidid><orcidid>https://orcid.org/0000-0003-0541-5515</orcidid><orcidid>https://orcid.org/0000-0001-8451-8656</orcidid><orcidid>https://orcid.org/0000-0002-7878-1658</orcidid></search><sort><creationdate>20200101</creationdate><title>ATCS: Auto-Tuning Configurations of Big Data Frameworks Based on Generative Adversarial Nets</title><author>Li, Mingyu ; Liu, Zhiqiang ; Shi, Xuanhua ; Jin, Hai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-7165d2c4c8756c4e2dfa09cfd2ff76294df005e14805488947db2abea834690f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Artificial neural networks</topic><topic>Automatic Tune Parameters</topic><topic>Big Data</topic><topic>Configuration management</topic><topic>Data processing</topic><topic>Decision trees</topic><topic>Generative Adversarial Nets</topic><topic>Genetic Algorithm</topic><topic>Genetic algorithms</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>Model accuracy</topic><topic>Performance enhancement</topic><topic>Performance prediction</topic><topic>Prediction models</topic><topic>Spark</topic><topic>Support vector machines</topic><topic>Training</topic><topic>Tuning</topic><topic>Workload</topic><topic>Workloads</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Mingyu</creatorcontrib><creatorcontrib>Liu, Zhiqiang</creatorcontrib><creatorcontrib>Shi, Xuanhua</creatorcontrib><creatorcontrib>Jin, Hai</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>TestCollectionTL3OpenAccess</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Mingyu</au><au>Liu, Zhiqiang</au><au>Shi, Xuanhua</au><au>Jin, Hai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ATCS: Auto-Tuning Configurations of Big Data Frameworks Based on Generative Adversarial Nets</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2020-01-01</date><risdate>2020</risdate><volume>8</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Big data processing frameworks (e.g., Spark, Storm) have been extensively used for massive data processing in the industry. To improve the performance and robustness of these frameworks, developers provide users with highly-configurable parameters. Due to the high-dimensional parameter space and complicated interactions of parameters, manual tuning of parameters is time-consuming and ineffective. Building performance-predicting models for big data frameworks is challenging for several reasons: (1) the significant time required to collect training data and (2) the poor accuracy of the prediction model when training data are limited. To meet this challenge, we proposes an auto-tuning configuration parameters system (ATCS), a new auto-tuning approach based on Generative Adversarial Nets (GAN). ATCS can build a performance prediction model with less training data and without sacrificing model accuracy. Moreover, an optimized Genetic Algorithm (GA) is used in ATCS to explore the parameter space for optimum solutions. To prove the effectiveness of ATCS, we select five frequently-used workloads in Spark, each of which runs on five different sized data sets. The results demonstrate that ATCS improves the performance of five frequently-used Spark workloads compared to the default configurations. We achieved a performance increase of 3.5× on average, with a maximum of 6.9×. To obtain similar model accuracy, experiment results also demonstrate that the quantity of ATCS training data is only 6% of Deep Neural Network (DNN) data, 13% of Support Vector Machine (SVM) data, 18% of Decision Tree (DT) data. Moreover, compared to other machine learning models, the average performance increase of ATCS is 1.7× that of DNN, 1.6× that of SVM, 1.7× that of DT on the five typical Spark programs.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2020.2979812</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-3934-7605</orcidid><orcidid>https://orcid.org/0000-0003-0541-5515</orcidid><orcidid>https://orcid.org/0000-0001-8451-8656</orcidid><orcidid>https://orcid.org/0000-0002-7878-1658</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2020-01, Vol.8, p.1-1
issn 2169-3536
2169-3536
language eng
recordid cdi_crossref_primary_10_1109_ACCESS_2020_2979812
source IEEE Open Access Journals; TestCollectionTL3OpenAccess; EZB-FREE-00999 freely available EZB journals
subjects Accuracy
Artificial neural networks
Automatic Tune Parameters
Big Data
Configuration management
Data processing
Decision trees
Generative Adversarial Nets
Genetic Algorithm
Genetic algorithms
Machine learning
Mathematical models
Model accuracy
Performance enhancement
Performance prediction
Prediction models
Spark
Support vector machines
Training
Tuning
Workload
Workloads
title ATCS: Auto-Tuning Configurations of Big Data Frameworks Based on Generative Adversarial Nets
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T07%3A28%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ATCS:%20Auto-Tuning%20Configurations%20of%20Big%20Data%20Frameworks%20Based%20on%20Generative%20Adversarial%20Nets&rft.jtitle=IEEE%20access&rft.au=Li,%20Mingyu&rft.date=2020-01-01&rft.volume=8&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2020.2979812&rft_dat=%3Cproquest_cross%3E2454875079%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2454875079&rft_id=info:pmid/&rft_ieee_id=9031432&rft_doaj_id=oai_doaj_org_article_47ccd89dd1464569923b95031def32d2&rfr_iscdi=true