Random Planted Forest: a directly interpretable tree ensemble

We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to mod...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Hiabu, Munir, Mammen, Enno, Meyer, Joseph T
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Mathematics - Statistics Theory Statistics - Machine Learning Statistics - Theory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Hiabu, Munir Mammen, Enno Meyer, Joseph T
description	We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. If we set this interaction bound to one, the resulting estimator is a sum of one-dimensional functions. In the other extreme case, if we do not set a limit, the resulting estimator and corresponding model place no restrictions on the form of the regression function. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. Code is available on GitHub https://github.com/PlantedML/randomPlantedForest.
doi_str_mv	10.48550/arxiv.2012.14563
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2012_14563</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2012_14563</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-69431fdc7ff9f78347cdc131b8fac7da626916478d07529d60a351e075195c973</originalsourceid><addsrcrecordid>eNotj81qwkAUhWfTRbE-QFedF0icm_nLFLooUqsgWIr7cJ25A4EkymSQ-vam6urwceBwPsZeQZSq1losMP2157ISUJWgtJHP7OMXh3Ds-U-HQ6bAV8dEY37nyEObyOfuwtupSKdEGQ8d8ZyIOA0j9RO9sKeI3UjzR87YfvW1X66L7e57s_zcFmisLIxTEmLwNkYXbS2V9cGDhEMd0duApjIOjLJ1EFZXLhiBUgNNAE57Z-WMvd1nb_-bU2p7TJfm36O5ecgr_rlB-Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Random Planted Forest: a directly interpretable tree ensemble</title><source>arXiv.org</source><creator>Hiabu, Munir ; Mammen, Enno ; Meyer, Joseph T</creator><creatorcontrib>Hiabu, Munir ; Mammen, Enno ; Meyer, Joseph T</creatorcontrib><description>We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. If we set this interaction bound to one, the resulting estimator is a sum of one-dimensional functions. In the other extreme case, if we do not set a limit, the resulting estimator and corresponding model place no restrictions on the form of the regression function. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. Code is available on GitHub https://github.com/PlantedML/randomPlantedForest.</description><identifier>DOI: 10.48550/arxiv.2012.14563</identifier><language>eng</language><subject>Computer Science - Learning ; Mathematics - Statistics Theory ; Statistics - Machine Learning ; Statistics - Theory</subject><creationdate>2020-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2012.14563$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2012.14563$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Hiabu, Munir</creatorcontrib><creatorcontrib>Mammen, Enno</creatorcontrib><creatorcontrib>Meyer, Joseph T</creatorcontrib><title>Random Planted Forest: a directly interpretable tree ensemble</title><description>We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. If we set this interaction bound to one, the resulting estimator is a sum of one-dimensional functions. In the other extreme case, if we do not set a limit, the resulting estimator and corresponding model place no restrictions on the form of the regression function. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. Code is available on GitHub https://github.com/PlantedML/randomPlantedForest.</description><subject>Computer Science - Learning</subject><subject>Mathematics - Statistics Theory</subject><subject>Statistics - Machine Learning</subject><subject>Statistics - Theory</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81qwkAUhWfTRbE-QFedF0icm_nLFLooUqsgWIr7cJ25A4EkymSQ-vam6urwceBwPsZeQZSq1losMP2157ISUJWgtJHP7OMXh3Ds-U-HQ6bAV8dEY37nyEObyOfuwtupSKdEGQ8d8ZyIOA0j9RO9sKeI3UjzR87YfvW1X66L7e57s_zcFmisLIxTEmLwNkYXbS2V9cGDhEMd0duApjIOjLJ1EFZXLhiBUgNNAE57Z-WMvd1nb_-bU2p7TJfm36O5ecgr_rlB-Q</recordid><startdate>20201228</startdate><enddate>20201228</enddate><creator>Hiabu, Munir</creator><creator>Mammen, Enno</creator><creator>Meyer, Joseph T</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20201228</creationdate><title>Random Planted Forest: a directly interpretable tree ensemble</title><author>Hiabu, Munir ; Mammen, Enno ; Meyer, Joseph T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-69431fdc7ff9f78347cdc131b8fac7da626916478d07529d60a351e075195c973</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Learning</topic><topic>Mathematics - Statistics Theory</topic><topic>Statistics - Machine Learning</topic><topic>Statistics - Theory</topic><toplevel>online_resources</toplevel><creatorcontrib>Hiabu, Munir</creatorcontrib><creatorcontrib>Mammen, Enno</creatorcontrib><creatorcontrib>Meyer, Joseph T</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hiabu, Munir</au><au>Mammen, Enno</au><au>Meyer, Joseph T</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Random Planted Forest: a directly interpretable tree ensemble</atitle><date>2020-12-28</date><risdate>2020</risdate><abstract>We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. If we set this interaction bound to one, the resulting estimator is a sum of one-dimensional functions. In the other extreme case, if we do not set a limit, the resulting estimator and corresponding model place no restrictions on the form of the regression function. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. Code is available on GitHub https://github.com/PlantedML/randomPlantedForest.</abstract><doi>10.48550/arxiv.2012.14563</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2012.14563
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2012_14563
source	arXiv.org
subjects	Computer Science - Learning Mathematics - Statistics Theory Statistics - Machine Learning Statistics - Theory
title	Random Planted Forest: a directly interpretable tree ensemble
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T10%3A15%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Random%20Planted%20Forest:%20a%20directly%20interpretable%20tree%20ensemble&rft.au=Hiabu,%20Munir&rft.date=2020-12-28&rft_id=info:doi/10.48550/arxiv.2012.14563&rft_dat=%3Carxiv_GOX%3E2012_14563%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true