Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning
Many complex diseases are caused by a variety of both genetic and environmental factors acting in conjunction. To help understand these relationships, nonparametric methods that use aggregate learning have been developed such as random forests and conditional forests. Molinaro et al. (2010) describe...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Olshen, Adam B. Strawderman, Robert L. Ryslik, Gregory Lostritto, Karen Arnold, Alice M. Molinaro, Annette M. |
description | Many complex diseases are caused by a variety of both genetic and environmental factors acting in conjunction. To help understand these relationships, nonparametric methods that use aggregate learning have been developed such as random forests and conditional forests. Molinaro et al. (2010) described a powerful, single model approach called partDSA that has the advantage of producing interpretable models. We propose two extensions to the partDSA algorithm called bagged partDSA and boosted partDSA. These algorithms achieve higher prediction accuracies than individual partDSA objects through aggregating over a set of partDSA objects. Further, by using partDSA objects in the ensemble, each base learner creates decision rules using both “and” and “or” statements, which allows for natural logical constructs. We also provide four variable ranking techniques that aid in identifying the most important individual factors in the models. In the regression context, we compared bagged partDSA and boosted partDSA to random forests and conditional forests. Using simulated and real data, we found that bagged partDSA had lower prediction error than the other methods if the data were generated by a simple logic model, and that it performed similarly for other generating mechanisms. We also found that boosted partDSA was effective for a particularly complex case. Taken together these results suggest that the new methods are useful additions to the ensemble learning toolbox. We implement these algorithms as part of the partDSA R package. Supplementary materials for this article are available online. |
doi_str_mv | 10.6084/m9.figshare.4892000 |
format | Dataset |
fullrecord | <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_6084_m9_figshare_4892000</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_6084_m9_figshare_4892000</sourcerecordid><originalsourceid>FETCH-LOGICAL-d890-27c7989607c8cece80d55fde8c5af5989ba2046d444a9c8185e85b18b63e15183</originalsourceid><addsrcrecordid>eNo1j8tOwzAURL1hgQpfwMY_kNRu7OR6GZWnFAGCbpHl2DeupTyQ41Ti72lLWc2MZjTSIeSOs7xkINaDyrvg572JmAtQG8bYNfl6nQ7Y09r7iN4kpPfYYwrTuP5c2jmFtJxD7Vw4GdqgiWMYPa17P8WQ9sNMuynSD7RLnMMB6buJ6bw9rm7IVWf6GW8vuiK7x4fd9jlr3p5etnWTOVAs21S2UqBKVlmwaBGYk7JzCFaaTh6b1myYKJ0QwigLHCSCbDm0ZYFccihWpPi7dSYZGxLq7xgGE380Z_rErgel_9n1hb34BU5EV2M</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning</title><source>DataCite</source><creator>Olshen, Adam B. ; Strawderman, Robert L. ; Ryslik, Gregory ; Lostritto, Karen ; Arnold, Alice M. ; Molinaro, Annette M.</creator><creatorcontrib>Olshen, Adam B. ; Strawderman, Robert L. ; Ryslik, Gregory ; Lostritto, Karen ; Arnold, Alice M. ; Molinaro, Annette M.</creatorcontrib><description>Many complex diseases are caused by a variety of both genetic and environmental factors acting in conjunction. To help understand these relationships, nonparametric methods that use aggregate learning have been developed such as random forests and conditional forests. Molinaro et al. (2010) described a powerful, single model approach called partDSA that has the advantage of producing interpretable models. We propose two extensions to the partDSA algorithm called bagged partDSA and boosted partDSA. These algorithms achieve higher prediction accuracies than individual partDSA objects through aggregating over a set of partDSA objects. Further, by using partDSA objects in the ensemble, each base learner creates decision rules using both “and” and “or” statements, which allows for natural logical constructs. We also provide four variable ranking techniques that aid in identifying the most important individual factors in the models. In the regression context, we compared bagged partDSA and boosted partDSA to random forests and conditional forests. Using simulated and real data, we found that bagged partDSA had lower prediction error than the other methods if the data were generated by a simple logic model, and that it performed similarly for other generating mechanisms. We also found that boosted partDSA was effective for a particularly complex case. Taken together these results suggest that the new methods are useful additions to the ensemble learning toolbox. We implement these algorithms as part of the partDSA R package. Supplementary materials for this article are available online.</description><identifier>DOI: 10.6084/m9.figshare.4892000</identifier><language>eng</language><publisher>Taylor & Francis</publisher><subject>Biological Sciences not elsewhere classified ; FOS: Biological sciences ; FOS: Computer and information sciences ; FOS: Mathematics ; FOS: Sociology ; Genetics ; Information Systems not elsewhere classified ; Mathematical Sciences not elsewhere classified ; Medicine ; Sociology</subject><creationdate>2017</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,1894</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.6084/m9.figshare.4892000$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Olshen, Adam B.</creatorcontrib><creatorcontrib>Strawderman, Robert L.</creatorcontrib><creatorcontrib>Ryslik, Gregory</creatorcontrib><creatorcontrib>Lostritto, Karen</creatorcontrib><creatorcontrib>Arnold, Alice M.</creatorcontrib><creatorcontrib>Molinaro, Annette M.</creatorcontrib><title>Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning</title><description>Many complex diseases are caused by a variety of both genetic and environmental factors acting in conjunction. To help understand these relationships, nonparametric methods that use aggregate learning have been developed such as random forests and conditional forests. Molinaro et al. (2010) described a powerful, single model approach called partDSA that has the advantage of producing interpretable models. We propose two extensions to the partDSA algorithm called bagged partDSA and boosted partDSA. These algorithms achieve higher prediction accuracies than individual partDSA objects through aggregating over a set of partDSA objects. Further, by using partDSA objects in the ensemble, each base learner creates decision rules using both “and” and “or” statements, which allows for natural logical constructs. We also provide four variable ranking techniques that aid in identifying the most important individual factors in the models. In the regression context, we compared bagged partDSA and boosted partDSA to random forests and conditional forests. Using simulated and real data, we found that bagged partDSA had lower prediction error than the other methods if the data were generated by a simple logic model, and that it performed similarly for other generating mechanisms. We also found that boosted partDSA was effective for a particularly complex case. Taken together these results suggest that the new methods are useful additions to the ensemble learning toolbox. We implement these algorithms as part of the partDSA R package. Supplementary materials for this article are available online.</description><subject>Biological Sciences not elsewhere classified</subject><subject>FOS: Biological sciences</subject><subject>FOS: Computer and information sciences</subject><subject>FOS: Mathematics</subject><subject>FOS: Sociology</subject><subject>Genetics</subject><subject>Information Systems not elsewhere classified</subject><subject>Mathematical Sciences not elsewhere classified</subject><subject>Medicine</subject><subject>Sociology</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2017</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNo1j8tOwzAURL1hgQpfwMY_kNRu7OR6GZWnFAGCbpHl2DeupTyQ41Ti72lLWc2MZjTSIeSOs7xkINaDyrvg572JmAtQG8bYNfl6nQ7Y09r7iN4kpPfYYwrTuP5c2jmFtJxD7Vw4GdqgiWMYPa17P8WQ9sNMuynSD7RLnMMB6buJ6bw9rm7IVWf6GW8vuiK7x4fd9jlr3p5etnWTOVAs21S2UqBKVlmwaBGYk7JzCFaaTh6b1myYKJ0QwigLHCSCbDm0ZYFccihWpPi7dSYZGxLq7xgGE380Z_rErgel_9n1hb34BU5EV2M</recordid><startdate>20170419</startdate><enddate>20170419</enddate><creator>Olshen, Adam B.</creator><creator>Strawderman, Robert L.</creator><creator>Ryslik, Gregory</creator><creator>Lostritto, Karen</creator><creator>Arnold, Alice M.</creator><creator>Molinaro, Annette M.</creator><general>Taylor & Francis</general><scope>DYCCY</scope><scope>PQ8</scope></search><sort><creationdate>20170419</creationdate><title>Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning</title><author>Olshen, Adam B. ; Strawderman, Robert L. ; Ryslik, Gregory ; Lostritto, Karen ; Arnold, Alice M. ; Molinaro, Annette M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d890-27c7989607c8cece80d55fde8c5af5989ba2046d444a9c8185e85b18b63e15183</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Biological Sciences not elsewhere classified</topic><topic>FOS: Biological sciences</topic><topic>FOS: Computer and information sciences</topic><topic>FOS: Mathematics</topic><topic>FOS: Sociology</topic><topic>Genetics</topic><topic>Information Systems not elsewhere classified</topic><topic>Mathematical Sciences not elsewhere classified</topic><topic>Medicine</topic><topic>Sociology</topic><toplevel>online_resources</toplevel><creatorcontrib>Olshen, Adam B.</creatorcontrib><creatorcontrib>Strawderman, Robert L.</creatorcontrib><creatorcontrib>Ryslik, Gregory</creatorcontrib><creatorcontrib>Lostritto, Karen</creatorcontrib><creatorcontrib>Arnold, Alice M.</creatorcontrib><creatorcontrib>Molinaro, Annette M.</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Olshen, Adam B.</au><au>Strawderman, Robert L.</au><au>Ryslik, Gregory</au><au>Lostritto, Karen</au><au>Arnold, Alice M.</au><au>Molinaro, Annette M.</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning</title><date>2017-04-19</date><risdate>2017</risdate><abstract>Many complex diseases are caused by a variety of both genetic and environmental factors acting in conjunction. To help understand these relationships, nonparametric methods that use aggregate learning have been developed such as random forests and conditional forests. Molinaro et al. (2010) described a powerful, single model approach called partDSA that has the advantage of producing interpretable models. We propose two extensions to the partDSA algorithm called bagged partDSA and boosted partDSA. These algorithms achieve higher prediction accuracies than individual partDSA objects through aggregating over a set of partDSA objects. Further, by using partDSA objects in the ensemble, each base learner creates decision rules using both “and” and “or” statements, which allows for natural logical constructs. We also provide four variable ranking techniques that aid in identifying the most important individual factors in the models. In the regression context, we compared bagged partDSA and boosted partDSA to random forests and conditional forests. Using simulated and real data, we found that bagged partDSA had lower prediction error than the other methods if the data were generated by a simple logic model, and that it performed similarly for other generating mechanisms. We also found that boosted partDSA was effective for a particularly complex case. Taken together these results suggest that the new methods are useful additions to the ensemble learning toolbox. We implement these algorithms as part of the partDSA R package. Supplementary materials for this article are available online.</abstract><pub>Taylor & Francis</pub><doi>10.6084/m9.figshare.4892000</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.6084/m9.figshare.4892000 |
ispartof | |
issn | |
language | eng |
recordid | cdi_datacite_primary_10_6084_m9_figshare_4892000 |
source | DataCite |
subjects | Biological Sciences not elsewhere classified FOS: Biological sciences FOS: Computer and information sciences FOS: Mathematics FOS: Sociology Genetics Information Systems not elsewhere classified Mathematical Sciences not elsewhere classified Medicine Sociology |
title | Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T15%3A59%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Olshen,%20Adam%20B.&rft.date=2017-04-19&rft_id=info:doi/10.6084/m9.figshare.4892000&rft_dat=%3Cdatacite_PQ8%3E10_6084_m9_figshare_4892000%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |