Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning

Many complex diseases are caused by a variety of both genetic and environmental factors acting in conjunction. To help understand these relationships, nonparametric methods that use aggregate learning have been developed such as random forests and conditional forests. Molinaro et al. (2010) describe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Olshen, Adam B., Strawderman, Robert L., Ryslik, Gregory, Lostritto, Karen, Arnold, Alice M., Molinaro, Annette M.
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Olshen, Adam B.
Strawderman, Robert L.
Ryslik, Gregory
Lostritto, Karen
Arnold, Alice M.
Molinaro, Annette M.
description Many complex diseases are caused by a variety of both genetic and environmental factors acting in conjunction. To help understand these relationships, nonparametric methods that use aggregate learning have been developed such as random forests and conditional forests. Molinaro et al. (2010) described a powerful, single model approach called partDSA that has the advantage of producing interpretable models. We propose two extensions to the partDSA algorithm called bagged partDSA and boosted partDSA. These algorithms achieve higher prediction accuracies than individual partDSA objects through aggregating over a set of partDSA objects. Further, by using partDSA objects in the ensemble, each base learner creates decision rules using both “and” and “or” statements, which allows for natural logical constructs. We also provide four variable ranking techniques that aid in identifying the most important individual factors in the models. In the regression context, we compared bagged partDSA and boosted partDSA to random forests and conditional forests. Using simulated and real data, we found that bagged partDSA had lower prediction error than the other methods if the data were generated by a simple logic model, and that it performed similarly for other generating mechanisms. We also found that boosted partDSA was effective for a particularly complex case. Taken together these results suggest that the new methods are useful additions to the ensemble learning toolbox. We implement these algorithms as part of the partDSA R package. Supplementary materials for this article are available online.
doi_str_mv 10.6084/m9.figshare.4892000
format Dataset
fullrecord <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_6084_m9_figshare_4892000</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_6084_m9_figshare_4892000</sourcerecordid><originalsourceid>FETCH-LOGICAL-d890-27c7989607c8cece80d55fde8c5af5989ba2046d444a9c8185e85b18b63e15183</originalsourceid><addsrcrecordid>eNo1j8tOwzAURL1hgQpfwMY_kNRu7OR6GZWnFAGCbpHl2DeupTyQ41Ti72lLWc2MZjTSIeSOs7xkINaDyrvg572JmAtQG8bYNfl6nQ7Y09r7iN4kpPfYYwrTuP5c2jmFtJxD7Vw4GdqgiWMYPa17P8WQ9sNMuynSD7RLnMMB6buJ6bw9rm7IVWf6GW8vuiK7x4fd9jlr3p5etnWTOVAs21S2UqBKVlmwaBGYk7JzCFaaTh6b1myYKJ0QwigLHCSCbDm0ZYFccihWpPi7dSYZGxLq7xgGE380Z_rErgel_9n1hb34BU5EV2M</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning</title><source>DataCite</source><creator>Olshen, Adam B. ; Strawderman, Robert L. ; Ryslik, Gregory ; Lostritto, Karen ; Arnold, Alice M. ; Molinaro, Annette M.</creator><creatorcontrib>Olshen, Adam B. ; Strawderman, Robert L. ; Ryslik, Gregory ; Lostritto, Karen ; Arnold, Alice M. ; Molinaro, Annette M.</creatorcontrib><description>Many complex diseases are caused by a variety of both genetic and environmental factors acting in conjunction. To help understand these relationships, nonparametric methods that use aggregate learning have been developed such as random forests and conditional forests. Molinaro et al. (2010) described a powerful, single model approach called partDSA that has the advantage of producing interpretable models. We propose two extensions to the partDSA algorithm called bagged partDSA and boosted partDSA. These algorithms achieve higher prediction accuracies than individual partDSA objects through aggregating over a set of partDSA objects. Further, by using partDSA objects in the ensemble, each base learner creates decision rules using both “and” and “or” statements, which allows for natural logical constructs. We also provide four variable ranking techniques that aid in identifying the most important individual factors in the models. In the regression context, we compared bagged partDSA and boosted partDSA to random forests and conditional forests. Using simulated and real data, we found that bagged partDSA had lower prediction error than the other methods if the data were generated by a simple logic model, and that it performed similarly for other generating mechanisms. We also found that boosted partDSA was effective for a particularly complex case. Taken together these results suggest that the new methods are useful additions to the ensemble learning toolbox. We implement these algorithms as part of the partDSA R package. Supplementary materials for this article are available online.</description><identifier>DOI: 10.6084/m9.figshare.4892000</identifier><language>eng</language><publisher>Taylor &amp; Francis</publisher><subject>Biological Sciences not elsewhere classified ; FOS: Biological sciences ; FOS: Computer and information sciences ; FOS: Mathematics ; FOS: Sociology ; Genetics ; Information Systems not elsewhere classified ; Mathematical Sciences not elsewhere classified ; Medicine ; Sociology</subject><creationdate>2017</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,1894</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.6084/m9.figshare.4892000$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Olshen, Adam B.</creatorcontrib><creatorcontrib>Strawderman, Robert L.</creatorcontrib><creatorcontrib>Ryslik, Gregory</creatorcontrib><creatorcontrib>Lostritto, Karen</creatorcontrib><creatorcontrib>Arnold, Alice M.</creatorcontrib><creatorcontrib>Molinaro, Annette M.</creatorcontrib><title>Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning</title><description>Many complex diseases are caused by a variety of both genetic and environmental factors acting in conjunction. To help understand these relationships, nonparametric methods that use aggregate learning have been developed such as random forests and conditional forests. Molinaro et al. (2010) described a powerful, single model approach called partDSA that has the advantage of producing interpretable models. We propose two extensions to the partDSA algorithm called bagged partDSA and boosted partDSA. These algorithms achieve higher prediction accuracies than individual partDSA objects through aggregating over a set of partDSA objects. Further, by using partDSA objects in the ensemble, each base learner creates decision rules using both “and” and “or” statements, which allows for natural logical constructs. We also provide four variable ranking techniques that aid in identifying the most important individual factors in the models. In the regression context, we compared bagged partDSA and boosted partDSA to random forests and conditional forests. Using simulated and real data, we found that bagged partDSA had lower prediction error than the other methods if the data were generated by a simple logic model, and that it performed similarly for other generating mechanisms. We also found that boosted partDSA was effective for a particularly complex case. Taken together these results suggest that the new methods are useful additions to the ensemble learning toolbox. We implement these algorithms as part of the partDSA R package. Supplementary materials for this article are available online.</description><subject>Biological Sciences not elsewhere classified</subject><subject>FOS: Biological sciences</subject><subject>FOS: Computer and information sciences</subject><subject>FOS: Mathematics</subject><subject>FOS: Sociology</subject><subject>Genetics</subject><subject>Information Systems not elsewhere classified</subject><subject>Mathematical Sciences not elsewhere classified</subject><subject>Medicine</subject><subject>Sociology</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2017</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNo1j8tOwzAURL1hgQpfwMY_kNRu7OR6GZWnFAGCbpHl2DeupTyQ41Ti72lLWc2MZjTSIeSOs7xkINaDyrvg572JmAtQG8bYNfl6nQ7Y09r7iN4kpPfYYwrTuP5c2jmFtJxD7Vw4GdqgiWMYPa17P8WQ9sNMuynSD7RLnMMB6buJ6bw9rm7IVWf6GW8vuiK7x4fd9jlr3p5etnWTOVAs21S2UqBKVlmwaBGYk7JzCFaaTh6b1myYKJ0QwigLHCSCbDm0ZYFccihWpPi7dSYZGxLq7xgGE380Z_rErgel_9n1hb34BU5EV2M</recordid><startdate>20170419</startdate><enddate>20170419</enddate><creator>Olshen, Adam B.</creator><creator>Strawderman, Robert L.</creator><creator>Ryslik, Gregory</creator><creator>Lostritto, Karen</creator><creator>Arnold, Alice M.</creator><creator>Molinaro, Annette M.</creator><general>Taylor &amp; Francis</general><scope>DYCCY</scope><scope>PQ8</scope></search><sort><creationdate>20170419</creationdate><title>Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning</title><author>Olshen, Adam B. ; Strawderman, Robert L. ; Ryslik, Gregory ; Lostritto, Karen ; Arnold, Alice M. ; Molinaro, Annette M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d890-27c7989607c8cece80d55fde8c5af5989ba2046d444a9c8185e85b18b63e15183</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Biological Sciences not elsewhere classified</topic><topic>FOS: Biological sciences</topic><topic>FOS: Computer and information sciences</topic><topic>FOS: Mathematics</topic><topic>FOS: Sociology</topic><topic>Genetics</topic><topic>Information Systems not elsewhere classified</topic><topic>Mathematical Sciences not elsewhere classified</topic><topic>Medicine</topic><topic>Sociology</topic><toplevel>online_resources</toplevel><creatorcontrib>Olshen, Adam B.</creatorcontrib><creatorcontrib>Strawderman, Robert L.</creatorcontrib><creatorcontrib>Ryslik, Gregory</creatorcontrib><creatorcontrib>Lostritto, Karen</creatorcontrib><creatorcontrib>Arnold, Alice M.</creatorcontrib><creatorcontrib>Molinaro, Annette M.</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Olshen, Adam B.</au><au>Strawderman, Robert L.</au><au>Ryslik, Gregory</au><au>Lostritto, Karen</au><au>Arnold, Alice M.</au><au>Molinaro, Annette M.</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning</title><date>2017-04-19</date><risdate>2017</risdate><abstract>Many complex diseases are caused by a variety of both genetic and environmental factors acting in conjunction. To help understand these relationships, nonparametric methods that use aggregate learning have been developed such as random forests and conditional forests. Molinaro et al. (2010) described a powerful, single model approach called partDSA that has the advantage of producing interpretable models. We propose two extensions to the partDSA algorithm called bagged partDSA and boosted partDSA. These algorithms achieve higher prediction accuracies than individual partDSA objects through aggregating over a set of partDSA objects. Further, by using partDSA objects in the ensemble, each base learner creates decision rules using both “and” and “or” statements, which allows for natural logical constructs. We also provide four variable ranking techniques that aid in identifying the most important individual factors in the models. In the regression context, we compared bagged partDSA and boosted partDSA to random forests and conditional forests. Using simulated and real data, we found that bagged partDSA had lower prediction error than the other methods if the data were generated by a simple logic model, and that it performed similarly for other generating mechanisms. We also found that boosted partDSA was effective for a particularly complex case. Taken together these results suggest that the new methods are useful additions to the ensemble learning toolbox. We implement these algorithms as part of the partDSA R package. Supplementary materials for this article are available online.</abstract><pub>Taylor &amp; Francis</pub><doi>10.6084/m9.figshare.4892000</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.6084/m9.figshare.4892000
ispartof
issn
language eng
recordid cdi_datacite_primary_10_6084_m9_figshare_4892000
source DataCite
subjects Biological Sciences not elsewhere classified
FOS: Biological sciences
FOS: Computer and information sciences
FOS: Mathematics
FOS: Sociology
Genetics
Information Systems not elsewhere classified
Mathematical Sciences not elsewhere classified
Medicine
Sociology
title Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T15%3A59%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Olshen,%20Adam%20B.&rft.date=2017-04-19&rft_id=info:doi/10.6084/m9.figshare.4892000&rft_dat=%3Cdatacite_PQ8%3E10_6084_m9_figshare_4892000%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true