Bootstrap Aggregating of Alternating Decision Trees to Detect Sets of SNP s That Associate With Disease

Complex genetic disorders are a result of a combination of genetic and nongenetic factors, all potentially interacting. Machine learning methods hold the potential to identify multilocus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learnin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Genetic epidemiology 2012-02, Vol.36 (2), p.99-106
Hauptverfasser: Guy, Richard T., Santago, Peter, Langefeld, Carl D.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Complex genetic disorders are a result of a combination of genetic and nongenetic factors, all potentially interacting. Machine learning methods hold the potential to identify multilocus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learning technique, offer a computationally low complexity algorithm capable of detecting associated sets of single nucleotide polymorphisms ( SNP s) of arbitrary size, including modern genome‐wide SNP scans. However, interpretation of the importance of an individual SNP within these trees can present challenges. We present a new decision tree algorithm denoted as Bagged Alternating Decision Trees (BADTrees) that is based on identifying common structural elements in a bootstrapped set of Alternating Decision Trees (ADTrees). The algorithm is order , where n is the number of SNP s considered and k is the number of SNP s in the tree constructed. Our simulation study suggests that BAD Trees have higher power and lower type I error rates than ADT rees alone and comparable power with lower type I error rates compared to logistic regression. We illustrate the application of these data using simulated data as well as from the L upus L arge A ssociation Study 1 (7,822 SNP s in 3,548 individuals). Our results suggest that BADT rees hold promise as a low computational order algorithm for detecting complex combinations of SNP and environmental factors associated with disease.
ISSN:0741-0395
1098-2272
DOI:10.1002/gepi.21608