Random Planted Forest: a directly interpretable tree ensemble

We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to mod...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Hiabu, Munir, Mammen, Enno, Meyer, Joseph T
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Hiabu, Munir
Mammen, Enno
Meyer, Joseph T
description We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. If we set this interaction bound to one, the resulting estimator is a sum of one-dimensional functions. In the other extreme case, if we do not set a limit, the resulting estimator and corresponding model place no restrictions on the form of the regression function. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. Code is available on GitHub https://github.com/PlantedML/randomPlantedForest.
doi_str_mv 10.48550/arxiv.2012.14563
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2012_14563</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2012_14563</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-69431fdc7ff9f78347cdc131b8fac7da626916478d07529d60a351e075195c973</originalsourceid><addsrcrecordid>eNotj81qwkAUhWfTRbE-QFedF0icm_nLFLooUqsgWIr7cJ25A4EkymSQ-vam6urwceBwPsZeQZSq1losMP2157ISUJWgtJHP7OMXh3Ds-U-HQ6bAV8dEY37nyEObyOfuwtupSKdEGQ8d8ZyIOA0j9RO9sKeI3UjzR87YfvW1X66L7e57s_zcFmisLIxTEmLwNkYXbS2V9cGDhEMd0duApjIOjLJ1EFZXLhiBUgNNAE57Z-WMvd1nb_-bU2p7TJfm36O5ecgr_rlB-Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Random Planted Forest: a directly interpretable tree ensemble</title><source>arXiv.org</source><creator>Hiabu, Munir ; Mammen, Enno ; Meyer, Joseph T</creator><creatorcontrib>Hiabu, Munir ; Mammen, Enno ; Meyer, Joseph T</creatorcontrib><description>We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. If we set this interaction bound to one, the resulting estimator is a sum of one-dimensional functions. In the other extreme case, if we do not set a limit, the resulting estimator and corresponding model place no restrictions on the form of the regression function. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. Code is available on GitHub https://github.com/PlantedML/randomPlantedForest.</description><identifier>DOI: 10.48550/arxiv.2012.14563</identifier><language>eng</language><subject>Computer Science - Learning ; Mathematics - Statistics Theory ; Statistics - Machine Learning ; Statistics - Theory</subject><creationdate>2020-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2012.14563$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2012.14563$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Hiabu, Munir</creatorcontrib><creatorcontrib>Mammen, Enno</creatorcontrib><creatorcontrib>Meyer, Joseph T</creatorcontrib><title>Random Planted Forest: a directly interpretable tree ensemble</title><description>We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. If we set this interaction bound to one, the resulting estimator is a sum of one-dimensional functions. In the other extreme case, if we do not set a limit, the resulting estimator and corresponding model place no restrictions on the form of the regression function. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. Code is available on GitHub https://github.com/PlantedML/randomPlantedForest.</description><subject>Computer Science - Learning</subject><subject>Mathematics - Statistics Theory</subject><subject>Statistics - Machine Learning</subject><subject>Statistics - Theory</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81qwkAUhWfTRbE-QFedF0icm_nLFLooUqsgWIr7cJ25A4EkymSQ-vam6urwceBwPsZeQZSq1losMP2157ISUJWgtJHP7OMXh3Ds-U-HQ6bAV8dEY37nyEObyOfuwtupSKdEGQ8d8ZyIOA0j9RO9sKeI3UjzR87YfvW1X66L7e57s_zcFmisLIxTEmLwNkYXbS2V9cGDhEMd0duApjIOjLJ1EFZXLhiBUgNNAE57Z-WMvd1nb_-bU2p7TJfm36O5ecgr_rlB-Q</recordid><startdate>20201228</startdate><enddate>20201228</enddate><creator>Hiabu, Munir</creator><creator>Mammen, Enno</creator><creator>Meyer, Joseph T</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20201228</creationdate><title>Random Planted Forest: a directly interpretable tree ensemble</title><author>Hiabu, Munir ; Mammen, Enno ; Meyer, Joseph T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-69431fdc7ff9f78347cdc131b8fac7da626916478d07529d60a351e075195c973</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Learning</topic><topic>Mathematics - Statistics Theory</topic><topic>Statistics - Machine Learning</topic><topic>Statistics - Theory</topic><toplevel>online_resources</toplevel><creatorcontrib>Hiabu, Munir</creatorcontrib><creatorcontrib>Mammen, Enno</creatorcontrib><creatorcontrib>Meyer, Joseph T</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hiabu, Munir</au><au>Mammen, Enno</au><au>Meyer, Joseph T</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Random Planted Forest: a directly interpretable tree ensemble</atitle><date>2020-12-28</date><risdate>2020</risdate><abstract>We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. If we set this interaction bound to one, the resulting estimator is a sum of one-dimensional functions. In the other extreme case, if we do not set a limit, the resulting estimator and corresponding model place no restrictions on the form of the regression function. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. Code is available on GitHub https://github.com/PlantedML/randomPlantedForest.</abstract><doi>10.48550/arxiv.2012.14563</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2012.14563
ispartof
issn
language eng
recordid cdi_arxiv_primary_2012_14563
source arXiv.org
subjects Computer Science - Learning
Mathematics - Statistics Theory
Statistics - Machine Learning
Statistics - Theory
title Random Planted Forest: a directly interpretable tree ensemble
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T10%3A15%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Random%20Planted%20Forest:%20a%20directly%20interpretable%20tree%20ensemble&rft.au=Hiabu,%20Munir&rft.date=2020-12-28&rft_id=info:doi/10.48550/arxiv.2012.14563&rft_dat=%3Carxiv_GOX%3E2012_14563%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true