Hadoop Performance Prediction Model Based on Random Forest

MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ZTE Communications 2013-06, Vol.11 (2), p.38-44
Hauptverfasser: Bei, Z, Yu, Z, Zhang, H, Xu, C, Feng, S, Dong, Z
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 44
container_issue 2
container_start_page 38
container_title ZTE Communications
container_volume 11
creator Bei, Z
Yu, Z
Zhang, H
Xu, C
Feng, S
Dong, Z
description MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently devel- oped machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system' s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.
doi_str_mv 10.3969/j.issn.1673-5188.2013.02.006
format Article
fullrecord <record><control><sourceid>wanfang_jour_proqu</sourceid><recordid>TN_cdi_wanfang_journals_zxtxjs_e201302007</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cqvip_id>46536953</cqvip_id><wanfj_id>zxtxjs_e201302007</wanfj_id><sourcerecordid>zxtxjs_e201302007</sourcerecordid><originalsourceid>FETCH-LOGICAL-c937-8886d6c04e8c51521bb0b1e2d981a20ddabdbfc1ab1e070a5310f43f1cb12c933</originalsourceid><addsrcrecordid>eNo9j01LAzEQhnNQsNT-hxU86GHXyWY3m3jTYq1QsUjvS762puwmbdJi9dcbqXga5uXhmXkRusZQEE753aawMboC04bkNWasKAGTAsoCgJ6h0X9-gSYxWgkAnLK6oiN0Pxfa-222NKHzYRBOmWwZjLZqb73LXr02ffYootFZWt-F037IZj6YuL9E553oo5n8zTFazZ5W03m-eHt-mT4scsVJkzPGqKYKKsNUjesSSwkSm1JzhkUJWgupZaewSCE0IGqCoatIh5XEZTKQMbo9aT-F64Rbtxt_CC4dbL-P--Mmtua3LJQATWJvTuw2-N0h_dgONirT98IZf4gtrirWNIRjntCrE6o-vFvvbBJvgx1E-GorWhPKa0J-ADayZi0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1448773919</pqid></control><display><type>article</type><title>Hadoop Performance Prediction Model Based on Random Forest</title><source>Alma/SFX Local Collection</source><creator>Bei, Z ; Yu, Z ; Zhang, H ; Xu, C ; Feng, S ; Dong, Z</creator><creatorcontrib>Bei, Z ; Yu, Z ; Zhang, H ; Xu, C ; Feng, S ; Dong, Z</creatorcontrib><description>MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently devel- oped machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system' s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.</description><identifier>ISSN: 1673-5188</identifier><identifier>DOI: 10.3969/j.issn.1673-5188.2013.02.006</identifier><language>eng</language><publisher>Wayne State University, Detroit, Michigan 48202, USA%Cloud Computing and IT Institute of ZTE Corporation, Nanjing 210012, China</publisher><subject>Algorithms ; Forests ; Freeware ; Mathematical models ; Performance prediction ; Programming ; p系统 ; Workload ; 性能预测 ; 机器学习算法 ; 森林 ; 模型基 ; 测试套件 ; 配置参数 ; 随机</subject><ispartof>ZTE Communications, 2013-06, Vol.11 (2), p.38-44</ispartof><rights>Copyright © Wanfang Data Co. Ltd. All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://image.cqvip.com/vip1000/qk/70429X/70429X.jpg</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Bei, Z</creatorcontrib><creatorcontrib>Yu, Z</creatorcontrib><creatorcontrib>Zhang, H</creatorcontrib><creatorcontrib>Xu, C</creatorcontrib><creatorcontrib>Feng, S</creatorcontrib><creatorcontrib>Dong, Z</creatorcontrib><title>Hadoop Performance Prediction Model Based on Random Forest</title><title>ZTE Communications</title><addtitle>ZTE Communications</addtitle><description>MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently devel- oped machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system' s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.</description><subject>Algorithms</subject><subject>Forests</subject><subject>Freeware</subject><subject>Mathematical models</subject><subject>Performance prediction</subject><subject>Programming</subject><subject>p系统</subject><subject>Workload</subject><subject>性能预测</subject><subject>机器学习算法</subject><subject>森林</subject><subject>模型基</subject><subject>测试套件</subject><subject>配置参数</subject><subject>随机</subject><issn>1673-5188</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNo9j01LAzEQhnNQsNT-hxU86GHXyWY3m3jTYq1QsUjvS762puwmbdJi9dcbqXga5uXhmXkRusZQEE753aawMboC04bkNWasKAGTAsoCgJ6h0X9-gSYxWgkAnLK6oiN0Pxfa-222NKHzYRBOmWwZjLZqb73LXr02ffYootFZWt-F037IZj6YuL9E553oo5n8zTFazZ5W03m-eHt-mT4scsVJkzPGqKYKKsNUjesSSwkSm1JzhkUJWgupZaewSCE0IGqCoatIh5XEZTKQMbo9aT-F64Rbtxt_CC4dbL-P--Mmtua3LJQATWJvTuw2-N0h_dgONirT98IZf4gtrirWNIRjntCrE6o-vFvvbBJvgx1E-GorWhPKa0J-ADayZi0</recordid><startdate>20130601</startdate><enddate>20130601</enddate><creator>Bei, Z</creator><creator>Yu, Z</creator><creator>Zhang, H</creator><creator>Xu, C</creator><creator>Feng, S</creator><creator>Dong, Z</creator><general>Wayne State University, Detroit, Michigan 48202, USA%Cloud Computing and IT Institute of ZTE Corporation, Nanjing 210012, China</general><general>Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China%Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China</general><scope>2RA</scope><scope>92L</scope><scope>CQIGP</scope><scope>W92</scope><scope>~WA</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>2B.</scope><scope>4A8</scope><scope>92I</scope><scope>93N</scope><scope>PSX</scope><scope>TCJ</scope></search><sort><creationdate>20130601</creationdate><title>Hadoop Performance Prediction Model Based on Random Forest</title><author>Bei, Z ; Yu, Z ; Zhang, H ; Xu, C ; Feng, S ; Dong, Z</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c937-8886d6c04e8c51521bb0b1e2d981a20ddabdbfc1ab1e070a5310f43f1cb12c933</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Forests</topic><topic>Freeware</topic><topic>Mathematical models</topic><topic>Performance prediction</topic><topic>Programming</topic><topic>p系统</topic><topic>Workload</topic><topic>性能预测</topic><topic>机器学习算法</topic><topic>森林</topic><topic>模型基</topic><topic>测试套件</topic><topic>配置参数</topic><topic>随机</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bei, Z</creatorcontrib><creatorcontrib>Yu, Z</creatorcontrib><creatorcontrib>Zhang, H</creatorcontrib><creatorcontrib>Xu, C</creatorcontrib><creatorcontrib>Feng, S</creatorcontrib><creatorcontrib>Dong, Z</creatorcontrib><collection>中文科技期刊数据库</collection><collection>中文科技期刊数据库-CALIS站点</collection><collection>中文科技期刊数据库-7.0平台</collection><collection>中文科技期刊数据库-工程技术</collection><collection>中文科技期刊数据库- 镜像站点</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Wanfang Data Journals - Hong Kong</collection><collection>WANFANG Data Centre</collection><collection>Wanfang Data Journals</collection><collection>万方数据期刊 - 香港版</collection><collection>China Online Journals (COJ)</collection><collection>China Online Journals (COJ)</collection><jtitle>ZTE Communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bei, Z</au><au>Yu, Z</au><au>Zhang, H</au><au>Xu, C</au><au>Feng, S</au><au>Dong, Z</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hadoop Performance Prediction Model Based on Random Forest</atitle><jtitle>ZTE Communications</jtitle><addtitle>ZTE Communications</addtitle><date>2013-06-01</date><risdate>2013</risdate><volume>11</volume><issue>2</issue><spage>38</spage><epage>44</epage><pages>38-44</pages><issn>1673-5188</issn><abstract>MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently devel- oped machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system' s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.</abstract><pub>Wayne State University, Detroit, Michigan 48202, USA%Cloud Computing and IT Institute of ZTE Corporation, Nanjing 210012, China</pub><doi>10.3969/j.issn.1673-5188.2013.02.006</doi><tpages>7</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1673-5188
ispartof ZTE Communications, 2013-06, Vol.11 (2), p.38-44
issn 1673-5188
language eng
recordid cdi_wanfang_journals_zxtxjs_e201302007
source Alma/SFX Local Collection
subjects Algorithms
Forests
Freeware
Mathematical models
Performance prediction
Programming
p系统
Workload
性能预测
机器学习算法
森林
模型基
测试套件
配置参数
随机
title Hadoop Performance Prediction Model Based on Random Forest
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T01%3A08%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wanfang_jour_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hadoop%20Performance%20Prediction%20Model%20Based%20on%20Random%20Forest&rft.jtitle=ZTE%20Communications&rft.au=Bei,%20Z&rft.date=2013-06-01&rft.volume=11&rft.issue=2&rft.spage=38&rft.epage=44&rft.pages=38-44&rft.issn=1673-5188&rft_id=info:doi/10.3969/j.issn.1673-5188.2013.02.006&rft_dat=%3Cwanfang_jour_proqu%3Ezxtxjs_e201302007%3C/wanfang_jour_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1448773919&rft_id=info:pmid/&rft_cqvip_id=46536953&rft_wanfj_id=zxtxjs_e201302007&rfr_iscdi=true