Novel Aggregate Deletion/Substitution/Addition Learning Algorithms for Recursive Partitioning

Many complex diseases are caused by a variety of both genetic and environmental factors acting in conjunction. To help understand these relationships, nonparametric methods that use aggregate learning have been developed such as random forests and conditional forests. Molinaro et al. (2010) describe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Olshen, Adam B., Strawderman, Robert L., Ryslik, Gregory, Lostritto, Karen, Arnold, Alice M., Molinaro, Annette M.
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Many complex diseases are caused by a variety of both genetic and environmental factors acting in conjunction. To help understand these relationships, nonparametric methods that use aggregate learning have been developed such as random forests and conditional forests. Molinaro et al. (2010) described a powerful, single model approach called partDSA that has the advantage of producing interpretable models. We propose two extensions to the partDSA algorithm called bagged partDSA and boosted partDSA. These algorithms achieve higher prediction accuracies than individual partDSA objects through aggregating over a set of partDSA objects. Further, by using partDSA objects in the ensemble, each base learner creates decision rules using both “and” and “or” statements, which allows for natural logical constructs. We also provide four variable ranking techniques that aid in identifying the most important individual factors in the models. In the regression context, we compared bagged partDSA and boosted partDSA to random forests and conditional forests. Using simulated and real data, we found that bagged partDSA had lower prediction error than the other methods if the data were generated by a simple logic model, and that it performed similarly for other generating mechanisms. We also found that boosted partDSA was effective for a particularly complex case. Taken together these results suggest that the new methods are useful additions to the ensemble learning toolbox. We implement these algorithms as part of the partDSA R package. Supplementary materials for this article are available online.
DOI:10.6084/m9.figshare.4892000