Hadoop Performance Prediction Model Based on Random Forest

MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ZTE Communications 2013-06, Vol.11 (2), p.38-44
Hauptverfasser:	Bei, Z, Yu, Z, Zhang, H, Xu, C, Feng, S, Dong, Z
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Forests Freeware Mathematical models Performance prediction Programming p系统 Workload 性能预测机器学习算法森林模型基测试套件配置参数随机
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	44
container_issue	2
container_start_page	38
container_title	ZTE Communications
container_volume	11
creator	Bei, Z Yu, Z Zhang, H Xu, C Feng, S Dong, Z
description	MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently devel- oped machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system＇ s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.
doi_str_mv	10.3969/j.issn.1673-5188.2013.02.006
format	Article
fullrecord	<record><control><sourceid>wanfang_jour_proqu</sourceid><recordid>TN_cdi_wanfang_journals_zxtxjs_e201302007</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cqvip_id>46536953</cqvip_id><wanfj_id>zxtxjs_e201302007</wanfj_id><sourcerecordid>zxtxjs_e201302007</sourcerecordid><originalsourceid>FETCH-LOGICAL-c937-8886d6c04e8c51521bb0b1e2d981a20ddabdbfc1ab1e070a5310f43f1cb12c933</originalsourceid><addsrcrecordid>eNo9j01LAzEQhnNQsNT-hxU86GHXyWY3m3jTYq1QsUjvS762puwmbdJi9dcbqXga5uXhmXkRusZQEE753aawMboC04bkNWasKAGTAsoCgJ6h0X9-gSYxWgkAnLK6oiN0Pxfa-222NKHzYRBOmWwZjLZqb73LXr02ffYootFZWt-F037IZj6YuL9E553oo5n8zTFazZ5W03m-eHt-mT4scsVJkzPGqKYKKsNUjesSSwkSm1JzhkUJWgupZaewSCE0IGqCoatIh5XEZTKQMbo9aT-F64Rbtxt_CC4dbL-P--Mmtua3LJQATWJvTuw2-N0h_dgONirT98IZf4gtrirWNIRjntCrE6o-vFvvbBJvgx1E-GorWhPKa0J-ADayZi0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1448773919</pqid></control><display><type>article</type><title>Hadoop Performance Prediction Model Based on Random Forest</title><source>Alma/SFX Local Collection</source><creator>Bei, Z ; Yu, Z ; Zhang, H ; Xu, C ; Feng, S ; Dong, Z</creator><creatorcontrib>Bei, Z ; Yu, Z ; Zhang, H ; Xu, C ; Feng, S ; Dong, Z</creatorcontrib><description>MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently devel- oped machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system＇ s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.</description><identifier>ISSN: 1673-5188</identifier><identifier>DOI: 10.3969/j.issn.1673-5188.2013.02.006</identifier><language>eng</language><publisher>Wayne State University, Detroit, Michigan 48202, USA%Cloud Computing and IT Institute of ZTE Corporation, Nanjing 210012, China</publisher><subject>Algorithms ; Forests ; Freeware ; Mathematical models ; Performance prediction ; Programming ; p系统 ; Workload ; 性能预测 ; 机器学习算法 ; 森林 ; 模型基 ; 测试套件 ; 配置参数 ; 随机</subject><ispartof>ZTE Communications, 2013-06, Vol.11 (2), p.38-44</ispartof><rights>Copyright © Wanfang Data Co. Ltd. All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://image.cqvip.com/vip1000/qk/70429X/70429X.jpg</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Bei, Z</creatorcontrib><creatorcontrib>Yu, Z</creatorcontrib><creatorcontrib>Zhang, H</creatorcontrib><creatorcontrib>Xu, C</creatorcontrib><creatorcontrib>Feng, S</creatorcontrib><creatorcontrib>Dong, Z</creatorcontrib><title>Hadoop Performance Prediction Model Based on Random Forest</title><title>ZTE Communications</title><addtitle>ZTE Communications</addtitle><description>MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently devel- oped machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system＇ s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.</description><subject>Algorithms</subject><subject>Forests</subject><subject>Freeware</subject><subject>Mathematical models</subject><subject>Performance prediction</subject><subject>Programming</subject><subject>p系统</subject><subject>Workload</subject><subject>性能预测</subject><subject>机器学习算法</subject><subject>森林</subject><subject>模型基</subject><subject>测试套件</subject><subject>配置参数</subject><subject>随机</subject><issn>1673-5188</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNo9j01LAzEQhnNQsNT-hxU86GHXyWY3m3jTYq1QsUjvS762puwmbdJi9dcbqXga5uXhmXkRusZQEE753aawMboC04bkNWasKAGTAsoCgJ6h0X9-gSYxWgkAnLK6oiN0Pxfa-222NKHzYRBOmWwZjLZqb73LXr02ffYootFZWt-F037IZj6YuL9E553oo5n8zTFazZ5W03m-eHt-mT4scsVJkzPGqKYKKsNUjesSSwkSm1JzhkUJWgupZaewSCE0IGqCoatIh5XEZTKQMbo9aT-F64Rbtxt_CC4dbL-P--Mmtua3LJQATWJvTuw2-N0h_dgONirT98IZf4gtrirWNIRjntCrE6o-vFvvbBJvgx1E-GorWhPKa0J-ADayZi0</recordid><startdate>20130601</startdate><enddate>20130601</enddate><creator>Bei, Z</creator><creator>Yu, Z</creator><creator>Zhang, H</creator><creator>Xu, C</creator><creator>Feng, S</creator><creator>Dong, Z</creator><general>Wayne State University, Detroit, Michigan 48202, USA%Cloud Computing and IT Institute of ZTE Corporation, Nanjing 210012, China</general><general>Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China%Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China</general><scope>2RA</scope><scope>92L</scope><scope>CQIGP</scope><scope>W92</scope><scope>~WA</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>2B.</scope><scope>4A8</scope><scope>92I</scope><scope>93N</scope><scope>PSX</scope><scope>TCJ</scope></search><sort><creationdate>20130601</creationdate><title>Hadoop Performance Prediction Model Based on Random Forest</title><author>Bei, Z ; Yu, Z ; Zhang, H ; Xu, C ; Feng, S ; Dong, Z</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c937-8886d6c04e8c51521bb0b1e2d981a20ddabdbfc1ab1e070a5310f43f1cb12c933</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Forests</topic><topic>Freeware</topic><topic>Mathematical models</topic><topic>Performance prediction</topic><topic>Programming</topic><topic>p系统</topic><topic>Workload</topic><topic>性能预测</topic><topic>机器学习算法</topic><topic>森林</topic><topic>模型基</topic><topic>测试套件</topic><topic>配置参数</topic><topic>随机</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bei, Z</creatorcontrib><creatorcontrib>Yu, Z</creatorcontrib><creatorcontrib>Zhang, H</creatorcontrib><creatorcontrib>Xu, C</creatorcontrib><creatorcontrib>Feng, S</creatorcontrib><creatorcontrib>Dong, Z</creatorcontrib><collection>中文科技期刊数据库</collection><collection>中文科技期刊数据库-CALIS站点</collection><collection>中文科技期刊数据库-7.0平台</collection><collection>中文科技期刊数据库-工程技术</collection><collection>中文科技期刊数据库- 镜像站点</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Wanfang Data Journals - Hong Kong</collection><collection>WANFANG Data Centre</collection><collection>Wanfang Data Journals</collection><collection>万方数据期刊 - 香港版</collection><collection>China Online Journals (COJ)</collection><collection>China Online Journals (COJ)</collection><jtitle>ZTE Communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bei, Z</au><au>Yu, Z</au><au>Zhang, H</au><au>Xu, C</au><au>Feng, S</au><au>Dong, Z</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hadoop Performance Prediction Model Based on Random Forest</atitle><jtitle>ZTE Communications</jtitle><addtitle>ZTE Communications</addtitle><date>2013-06-01</date><risdate>2013</risdate><volume>11</volume><issue>2</issue><spage>38</spage><epage>44</epage><pages>38-44</pages><issn>1673-5188</issn><abstract>MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently devel- oped machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system＇ s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.</abstract><pub>Wayne State University, Detroit, Michigan 48202, USA%Cloud Computing and IT Institute of ZTE Corporation, Nanjing 210012, China</pub><doi>10.3969/j.issn.1673-5188.2013.02.006</doi><tpages>7</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1673-5188
ispartof	ZTE Communications, 2013-06, Vol.11 (2), p.38-44
issn	1673-5188
language	eng
recordid	cdi_wanfang_journals_zxtxjs_e201302007
source	Alma/SFX Local Collection
subjects	Algorithms Forests Freeware Mathematical models Performance prediction Programming p系统 Workload 性能预测机器学习算法森林模型基测试套件配置参数随机
title	Hadoop Performance Prediction Model Based on Random Forest
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T01%3A08%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wanfang_jour_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hadoop%20Performance%20Prediction%20Model%20Based%20on%20Random%20Forest&rft.jtitle=ZTE%20Communications&rft.au=Bei,%20Z&rft.date=2013-06-01&rft.volume=11&rft.issue=2&rft.spage=38&rft.epage=44&rft.pages=38-44&rft.issn=1673-5188&rft_id=info:doi/10.3969/j.issn.1673-5188.2013.02.006&rft_dat=%3Cwanfang_jour_proqu%3Ezxtxjs_e201302007%3C/wanfang_jour_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1448773919&rft_id=info:pmid/&rft_cqvip_id=46536953&rft_wanfj_id=zxtxjs_e201302007&rfr_iscdi=true