Double random forest

Random forest (RF) is one of the most popular parallel ensemble methods, using decision trees as classifiers. One of the hyper-parameters to choose from for RF fitting is the nodesize, which determines the individual tree size. In this paper, we begin with the observation that for many data sets (34...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Machine learning 2020-08, Vol.109 (8), p.1569-1586
Hauptverfasser:	Han, Sunwoo, Kim, Hyunjoong, Lee, Yung-Seop
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Artificial Intelligence Classification Computer Science Control Datasets Decision trees Default Experiments Machine Learning Mechatronics Medical research Methods Natural Language Processing (NLP) Parameters Robotics Simulation and Modeling
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1586
container_issue	8
container_start_page	1569
container_title	Machine learning
container_volume	109
creator	Han, Sunwoo Kim, Hyunjoong Lee, Yung-Seop
description	Random forest (RF) is one of the most popular parallel ensemble methods, using decision trees as classifiers. One of the hyper-parameters to choose from for RF fitting is the nodesize, which determines the individual tree size. In this paper, we begin with the observation that for many data sets (34 out of 58), the best RF prediction accuracy is achieved when the trees are grown fully by minimizing the nodesize parameter. This observation leads to the idea that prediction accuracy could be further improved if we find a way to generate even bigger trees than the ones with a minimum nodesize. In other words, the largest tree created with the minimum nodesize parameter may not be sufficiently large for the best performance of RF. To produce bigger trees than those by RF, we propose a new classification ensemble method called double random forest (DRF). The new method uses bootstrap on each node during the tree creation process, instead of just bootstrapping once on the root node as in RF. This method, in turn, provides an ensemble of more diverse trees, allowing for more accurate predictions. Finally, for data where RF does not produce trees of sufficient size, we have successfully demonstrated that DRF provides more accurate predictions than RF.
doi_str_mv	10.1007/s10994-020-05889-1
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2547179570</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2547179570</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-995dbd5946c531d92bd7edb9eb00920bc79509541828373415936d0975106d9e3</originalsourceid><addsrcrecordid>eNp9kEFLAzEQhYMouFZvnjwVPEdnkp0kc5RqVSh40XPoblKxtLs16R7890ZX8NbTMPB978ET4grhBgHsbUZgriUokEDOscQjUSFZXV5Dx6IC50gaVHQqznJeA4AyzlTi8r4fmk2cpmUX-u101aeY9-fiZLXc5Hjxdyfibf7wOnuSi5fH59ndQraacS-ZKTSBuDYtaQysmmBjaDg2AKygaS0TMNXolNNW10isTQC2hGACRz0R12PuLvWfQyn2635IXan0imqLxbdwkKo1aW0cq0KpkWpTn3OKK79LH9tl-vII_mcjP27ky0b-dyOPRdKjlAvcvcf0H33A-gaQzGUP</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2435336892</pqid></control><display><type>article</type><title>Double random forest</title><source>SpringerNature Journals</source><creator>Han, Sunwoo ; Kim, Hyunjoong ; Lee, Yung-Seop</creator><creatorcontrib>Han, Sunwoo ; Kim, Hyunjoong ; Lee, Yung-Seop</creatorcontrib><description>Random forest (RF) is one of the most popular parallel ensemble methods, using decision trees as classifiers. One of the hyper-parameters to choose from for RF fitting is the nodesize, which determines the individual tree size. In this paper, we begin with the observation that for many data sets (34 out of 58), the best RF prediction accuracy is achieved when the trees are grown fully by minimizing the nodesize parameter. This observation leads to the idea that prediction accuracy could be further improved if we find a way to generate even bigger trees than the ones with a minimum nodesize. In other words, the largest tree created with the minimum nodesize parameter may not be sufficiently large for the best performance of RF. To produce bigger trees than those by RF, we propose a new classification ensemble method called double random forest (DRF). The new method uses bootstrap on each node during the tree creation process, instead of just bootstrapping once on the root node as in RF. This method, in turn, provides an ensemble of more diverse trees, allowing for more accurate predictions. Finally, for data where RF does not produce trees of sufficient size, we have successfully demonstrated that DRF provides more accurate predictions than RF.</description><identifier>ISSN: 0885-6125</identifier><identifier>EISSN: 1573-0565</identifier><identifier>DOI: 10.1007/s10994-020-05889-1</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Algorithms ; Artificial Intelligence ; Classification ; Computer Science ; Control ; Datasets ; Decision trees ; Default ; Experiments ; Machine Learning ; Mechatronics ; Medical research ; Methods ; Natural Language Processing (NLP) ; Parameters ; Robotics ; Simulation and Modeling</subject><ispartof>Machine learning, 2020-08, Vol.109 (8), p.1569-1586</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c391t-995dbd5946c531d92bd7edb9eb00920bc79509541828373415936d0975106d9e3</citedby><cites>FETCH-LOGICAL-c391t-995dbd5946c531d92bd7edb9eb00920bc79509541828373415936d0975106d9e3</cites><orcidid>0000-0001-6761-6318</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10994-020-05889-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10994-020-05889-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Han, Sunwoo</creatorcontrib><creatorcontrib>Kim, Hyunjoong</creatorcontrib><creatorcontrib>Lee, Yung-Seop</creatorcontrib><title>Double random forest</title><title>Machine learning</title><addtitle>Mach Learn</addtitle><description>Random forest (RF) is one of the most popular parallel ensemble methods, using decision trees as classifiers. One of the hyper-parameters to choose from for RF fitting is the nodesize, which determines the individual tree size. In this paper, we begin with the observation that for many data sets (34 out of 58), the best RF prediction accuracy is achieved when the trees are grown fully by minimizing the nodesize parameter. This observation leads to the idea that prediction accuracy could be further improved if we find a way to generate even bigger trees than the ones with a minimum nodesize. In other words, the largest tree created with the minimum nodesize parameter may not be sufficiently large for the best performance of RF. To produce bigger trees than those by RF, we propose a new classification ensemble method called double random forest (DRF). The new method uses bootstrap on each node during the tree creation process, instead of just bootstrapping once on the root node as in RF. This method, in turn, provides an ensemble of more diverse trees, allowing for more accurate predictions. Finally, for data where RF does not produce trees of sufficient size, we have successfully demonstrated that DRF provides more accurate predictions than RF.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Control</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Default</subject><subject>Experiments</subject><subject>Machine Learning</subject><subject>Mechatronics</subject><subject>Medical research</subject><subject>Methods</subject><subject>Natural Language Processing (NLP)</subject><subject>Parameters</subject><subject>Robotics</subject><subject>Simulation and Modeling</subject><issn>0885-6125</issn><issn>1573-0565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kEFLAzEQhYMouFZvnjwVPEdnkp0kc5RqVSh40XPoblKxtLs16R7890ZX8NbTMPB978ET4grhBgHsbUZgriUokEDOscQjUSFZXV5Dx6IC50gaVHQqznJeA4AyzlTi8r4fmk2cpmUX-u101aeY9-fiZLXc5Hjxdyfibf7wOnuSi5fH59ndQraacS-ZKTSBuDYtaQysmmBjaDg2AKygaS0TMNXolNNW10isTQC2hGACRz0R12PuLvWfQyn2635IXan0imqLxbdwkKo1aW0cq0KpkWpTn3OKK79LH9tl-vII_mcjP27ky0b-dyOPRdKjlAvcvcf0H33A-gaQzGUP</recordid><startdate>20200801</startdate><enddate>20200801</enddate><creator>Han, Sunwoo</creator><creator>Kim, Hyunjoong</creator><creator>Lee, Yung-Seop</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M2P</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0001-6761-6318</orcidid></search><sort><creationdate>20200801</creationdate><title>Double random forest</title><author>Han, Sunwoo ; Kim, Hyunjoong ; Lee, Yung-Seop</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-995dbd5946c531d92bd7edb9eb00920bc79509541828373415936d0975106d9e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Control</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Default</topic><topic>Experiments</topic><topic>Machine Learning</topic><topic>Mechatronics</topic><topic>Medical research</topic><topic>Methods</topic><topic>Natural Language Processing (NLP)</topic><topic>Parameters</topic><topic>Robotics</topic><topic>Simulation and Modeling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Han, Sunwoo</creatorcontrib><creatorcontrib>Kim, Hyunjoong</creatorcontrib><creatorcontrib>Lee, Yung-Seop</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Machine learning</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Sunwoo</au><au>Kim, Hyunjoong</au><au>Lee, Yung-Seop</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Double random forest</atitle><jtitle>Machine learning</jtitle><stitle>Mach Learn</stitle><date>2020-08-01</date><risdate>2020</risdate><volume>109</volume><issue>8</issue><spage>1569</spage><epage>1586</epage><pages>1569-1586</pages><issn>0885-6125</issn><eissn>1573-0565</eissn><abstract>Random forest (RF) is one of the most popular parallel ensemble methods, using decision trees as classifiers. One of the hyper-parameters to choose from for RF fitting is the nodesize, which determines the individual tree size. In this paper, we begin with the observation that for many data sets (34 out of 58), the best RF prediction accuracy is achieved when the trees are grown fully by minimizing the nodesize parameter. This observation leads to the idea that prediction accuracy could be further improved if we find a way to generate even bigger trees than the ones with a minimum nodesize. In other words, the largest tree created with the minimum nodesize parameter may not be sufficiently large for the best performance of RF. To produce bigger trees than those by RF, we propose a new classification ensemble method called double random forest (DRF). The new method uses bootstrap on each node during the tree creation process, instead of just bootstrapping once on the root node as in RF. This method, in turn, provides an ensemble of more diverse trees, allowing for more accurate predictions. Finally, for data where RF does not produce trees of sufficient size, we have successfully demonstrated that DRF provides more accurate predictions than RF.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10994-020-05889-1</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0001-6761-6318</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0885-6125
ispartof	Machine learning, 2020-08, Vol.109 (8), p.1569-1586
issn	0885-6125 1573-0565
language	eng
recordid	cdi_proquest_journals_2547179570
source	SpringerNature Journals
subjects	Accuracy Algorithms Artificial Intelligence Classification Computer Science Control Datasets Decision trees Default Experiments Machine Learning Mechatronics Medical research Methods Natural Language Processing (NLP) Parameters Robotics Simulation and Modeling
title	Double random forest
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T04%3A16%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Double%20random%20forest&rft.jtitle=Machine%20learning&rft.au=Han,%20Sunwoo&rft.date=2020-08-01&rft.volume=109&rft.issue=8&rft.spage=1569&rft.epage=1586&rft.pages=1569-1586&rft.issn=0885-6125&rft.eissn=1573-0565&rft_id=info:doi/10.1007/s10994-020-05889-1&rft_dat=%3Cproquest_cross%3E2547179570%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2435336892&rft_id=info:pmid/&rfr_iscdi=true