On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications

Random Forest (RF) is an ensemble supervised machine learning technique that was developed by Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for enhancing and improving its pe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Fawagreh, Khaled, Gaber, Mohamad Medhat, Elyan, Eyad
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Fawagreh, Khaled
Gaber, Mohamad Medhat
Elyan, Eyad
description Random Forest (RF) is an ensemble supervised machine learning technique that was developed by Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for enhancing and improving its performance accuracy. This explains why, over the past decade, there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. Since it has been proven empiricallthat ensembles tend to yield better results when there is a significant diversity among the constituent models, the objective of this paper is twofold. First, it investigates how data clustering (a well known diversity technique) can be applied to identify groups of similar decision trees in an RF in order to eliminate redundant trees by selecting a representative from each group (cluster). Second, these likely diverse representatives are then used to produce an extension of RF termed CLUB-DRF that is much smaller in size than RF, and yet performs at least as good as RF, and mostly exhibits higher performance in terms of accuracy. The latter refers to a known technique called ensemble pruning. Experimental results on 15 real datasets from the UCI repository prove the superiority of our proposed extension over the traditional RF. Most of our experiments achieved at least 95% or above pruning level while retaining or outperforming the RF accuracy.
doi_str_mv 10.48550/arxiv.1503.04996
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1503_04996</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1503_04996</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-80aa182d3a190a93a935adb63229b57af78ad65e4bdfd668e930d0f121f06c5e3</originalsourceid><addsrcrecordid>eNotz8FKxDAYBOBcPMjqA3gyL9CaNE2aHJelq8LCyrr38rf5I4E2KUld1rdXqzAwpxn4CHngrKy1lOwJ0tVfSi6ZKFltjLol78dA2-uScEL6lj6DDx80OnqCYONE9zFhXmgbMk79iJm6mOgJYSwWvw7Q-mHxF6TbeR79AIuPId-RGwdjxvv_3pDzvj3vXorD8fl1tz0UoBpVaAbAdWUFcMPAiJ9IsL0SVWV62YBrNFglse6ts0ppNIJZ5njFHVODRLEhj3-3q6qbk58gfXW_um7ViW8dcUqu</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications</title><source>arXiv.org</source><creator>Fawagreh, Khaled ; Gaber, Mohamad Medhat ; Elyan, Eyad</creator><creatorcontrib>Fawagreh, Khaled ; Gaber, Mohamad Medhat ; Elyan, Eyad</creatorcontrib><description>Random Forest (RF) is an ensemble supervised machine learning technique that was developed by Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for enhancing and improving its performance accuracy. This explains why, over the past decade, there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. Since it has been proven empiricallthat ensembles tend to yield better results when there is a significant diversity among the constituent models, the objective of this paper is twofold. First, it investigates how data clustering (a well known diversity technique) can be applied to identify groups of similar decision trees in an RF in order to eliminate redundant trees by selecting a representative from each group (cluster). Second, these likely diverse representatives are then used to produce an extension of RF termed CLUB-DRF that is much smaller in size than RF, and yet performs at least as good as RF, and mostly exhibits higher performance in terms of accuracy. The latter refers to a known technique called ensemble pruning. Experimental results on 15 real datasets from the UCI repository prove the superiority of our proposed extension over the traditional RF. Most of our experiments achieved at least 95% or above pruning level while retaining or outperforming the RF accuracy.</description><identifier>DOI: 10.48550/arxiv.1503.04996</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2015-03</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1503.04996$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1503.04996$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Fawagreh, Khaled</creatorcontrib><creatorcontrib>Gaber, Mohamad Medhat</creatorcontrib><creatorcontrib>Elyan, Eyad</creatorcontrib><title>On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications</title><description>Random Forest (RF) is an ensemble supervised machine learning technique that was developed by Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for enhancing and improving its performance accuracy. This explains why, over the past decade, there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. Since it has been proven empiricallthat ensembles tend to yield better results when there is a significant diversity among the constituent models, the objective of this paper is twofold. First, it investigates how data clustering (a well known diversity technique) can be applied to identify groups of similar decision trees in an RF in order to eliminate redundant trees by selecting a representative from each group (cluster). Second, these likely diverse representatives are then used to produce an extension of RF termed CLUB-DRF that is much smaller in size than RF, and yet performs at least as good as RF, and mostly exhibits higher performance in terms of accuracy. The latter refers to a known technique called ensemble pruning. Experimental results on 15 real datasets from the UCI repository prove the superiority of our proposed extension over the traditional RF. Most of our experiments achieved at least 95% or above pruning level while retaining or outperforming the RF accuracy.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8FKxDAYBOBcPMjqA3gyL9CaNE2aHJelq8LCyrr38rf5I4E2KUld1rdXqzAwpxn4CHngrKy1lOwJ0tVfSi6ZKFltjLol78dA2-uScEL6lj6DDx80OnqCYONE9zFhXmgbMk79iJm6mOgJYSwWvw7Q-mHxF6TbeR79AIuPId-RGwdjxvv_3pDzvj3vXorD8fl1tz0UoBpVaAbAdWUFcMPAiJ9IsL0SVWV62YBrNFglse6ts0ppNIJZ5njFHVODRLEhj3-3q6qbk58gfXW_um7ViW8dcUqu</recordid><startdate>20150317</startdate><enddate>20150317</enddate><creator>Fawagreh, Khaled</creator><creator>Gaber, Mohamad Medhat</creator><creator>Elyan, Eyad</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20150317</creationdate><title>On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications</title><author>Fawagreh, Khaled ; Gaber, Mohamad Medhat ; Elyan, Eyad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-80aa182d3a190a93a935adb63229b57af78ad65e4bdfd668e930d0f121f06c5e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Fawagreh, Khaled</creatorcontrib><creatorcontrib>Gaber, Mohamad Medhat</creatorcontrib><creatorcontrib>Elyan, Eyad</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fawagreh, Khaled</au><au>Gaber, Mohamad Medhat</au><au>Elyan, Eyad</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications</atitle><date>2015-03-17</date><risdate>2015</risdate><abstract>Random Forest (RF) is an ensemble supervised machine learning technique that was developed by Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for enhancing and improving its performance accuracy. This explains why, over the past decade, there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. Since it has been proven empiricallthat ensembles tend to yield better results when there is a significant diversity among the constituent models, the objective of this paper is twofold. First, it investigates how data clustering (a well known diversity technique) can be applied to identify groups of similar decision trees in an RF in order to eliminate redundant trees by selecting a representative from each group (cluster). Second, these likely diverse representatives are then used to produce an extension of RF termed CLUB-DRF that is much smaller in size than RF, and yet performs at least as good as RF, and mostly exhibits higher performance in terms of accuracy. The latter refers to a known technique called ensemble pruning. Experimental results on 15 real datasets from the UCI repository prove the superiority of our proposed extension over the traditional RF. Most of our experiments achieved at least 95% or above pruning level while retaining or outperforming the RF accuracy.</abstract><doi>10.48550/arxiv.1503.04996</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1503.04996
ispartof
issn
language eng
recordid cdi_arxiv_primary_1503_04996
source arXiv.org
subjects Computer Science - Learning
title On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T08%3A59%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20Extreme%20Pruning%20of%20Random%20Forest%20Ensembles%20for%20Real-time%20Predictive%20Applications&rft.au=Fawagreh,%20Khaled&rft.date=2015-03-17&rft_id=info:doi/10.48550/arxiv.1503.04996&rft_dat=%3Carxiv_GOX%3E1503_04996%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true