Bootstrap Aggregating of Alternating Decision Trees to Detect Sets of SNP s That Associate With Disease
Complex genetic disorders are a result of a combination of genetic and nongenetic factors, all potentially interacting. Machine learning methods hold the potential to identify multilocus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learnin...
Gespeichert in:
Veröffentlicht in: | Genetic epidemiology 2012-02, Vol.36 (2), p.99-106 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Complex genetic disorders are a result of a combination of genetic and nongenetic factors, all potentially interacting. Machine learning methods hold the potential to identify multilocus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learning technique, offer a computationally low complexity algorithm capable of detecting associated sets of single nucleotide polymorphisms (
SNP
s) of arbitrary size, including modern genome‐wide
SNP
scans. However, interpretation of the importance of an individual SNP within these trees can present challenges. We present a new decision tree algorithm denoted as Bagged Alternating Decision Trees (BADTrees) that is based on identifying common structural elements in a bootstrapped set of Alternating Decision Trees (ADTrees). The algorithm is order
, where
n
is the number of
SNP
s considered and
k
is the number of
SNP
s in the tree constructed. Our simulation study suggests that
BAD
Trees have higher power and lower type I error rates than
ADT
rees alone and comparable power with lower type I error rates compared to logistic regression. We illustrate the application of these data using simulated data as well as from the
L
upus
L
arge
A
ssociation Study 1 (7,822
SNP
s in 3,548 individuals). Our results suggest that
BADT
rees hold promise as a low computational order algorithm for detecting complex combinations of
SNP
and environmental factors associated with disease. |
---|---|
ISSN: | 0741-0395 1098-2272 |
DOI: | 10.1002/gepi.21608 |