Deep learning–based genome‐wide association analysis in Alzheimer’s disease

Background Genome‐wide association study (GWAS) designs are widely used to identify genetic loci associated with Alzheimer’s disease (AD) by performing a statistical test for each single‐nucleotide polymorphism (SNP). This creates a significant multiple testing challenge. Deep learning has demonstra...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Alzheimer's & dementia 2021-12, Vol.17 (S5), p.n/a
Hauptverfasser: Jo, Taeho, Nho, Kwangsik, Saykin, Andrew J.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Background Genome‐wide association study (GWAS) designs are widely used to identify genetic loci associated with Alzheimer’s disease (AD) by performing a statistical test for each single‐nucleotide polymorphism (SNP). This creates a significant multiple testing challenge. Deep learning has demonstrated remarkable ability to identify non‐linear patterns using large data sets but application to AD genetics has been limited. Here we report preliminary results of a deep learning framework developed to identify AD‐associated genetic variation on a genome‐wide scale. Method We used genome‐wide genotyping data (12,448,786 SNPs following imputation) from 916 participants in the Alzheimer’s Disease Neuroimaging Initiative (458 cognitively normal controls and 458 AD patients). A convolutional neural network (CNN) consisting of convolutional, pooling and fully connected Softmax layers was used in a two‐stage approach. Data was divided into training‐testing‐validation sets (60:20:20 ratio). Area under the curve (AUC) was used to assess the model performance. Result The first stage of the deep learning approach identified 2,335 candidate genetic regions (93,400 SNPs) as associated with AD. The second stage investigated the association of identified SNPs with AD by calculating p‐values for each SNP based on AD influence z‐scores derived from the deep learning model. This approach identified genetic loci in the APOE region as most highly associated with AD (p‐value < 5X10‐8). Case/control classification using the identified SNPs yielded mean AUCs of 0.74, 0.79, 0.82, 0.90 for the thresholds of p‐value = 1x10‐5 (114 SNPs), 1x10‐4 (243 SNPs), 1x10‐3 (724 SNPs) and 1x10‐2 (2,846 SNPs), respectively (Fig. 1) and mean accuracies were 0.66, 0.69, 0.73, and 0.81, respectively. Conclusion Preliminary results indicate that a deep learning approach can be used to identify AD‐associated genetic loci and reduce the computational complexity to detect nonlinear interactions between SNPs, which may yield enhanced prediction accuracy for AD risk using genetic information. Future refinements of the deep learning framework are planned, including methods to reduce computational time and integrate additional omics layers and clinical data, as well as validation with independent replication samples.
ISSN:1552-5260
1552-5279
DOI:10.1002/alz.056510