Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification

Abstract Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identif...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Briefings in bioinformatics 2022-03, Vol.23 (2)
Hauptverfasser: Jo, Taeho, Nho, Kwangsik, Bice, Paula, Saykin, Andrew J
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 2
container_start_page
container_title Briefings in bioinformatics
container_volume 23
creator Jo, Taeho
Nho, Kwangsik
Bice, Paula
Saykin, Andrew J
description Abstract Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) that can be applied to develop accurate disease classification models. In the first step, we divided the whole genome into nonoverlapping fragments of an optimal size and then ran convolutional neural network (CNN) on each fragment to select phenotype-associated fragments. In the second step, using a Sliding Window Association Test (SWAT), we ran CNN on the selected fragments to calculate phenotype influence scores (PIS) and identify phenotype-associated SNPs based on PIS. In the third step, we ran CNN on all identified SNPs to develop a classification model. We tested our approach using GWAS data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) including (N = 981; cognitively normal older adults (CN) = 650 and AD = 331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an area under the curve (AUC) of 0.82, which was compatible with traditional machine learning approaches, random forest and XGBoost. SWAT-CNN, a novel deep learning–based genome-wide approach, identified AD-associated SNPs and a classification model for AD and may hold promise for a range of biomedical applications.
doi_str_mv 10.1093/bib/bbac022
format Article
fullrecord <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8921609</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bib/bbac022</oup_id><sourcerecordid>2640678751</sourcerecordid><originalsourceid>FETCH-LOGICAL-c440t-23909a8a82c311ca4f2ed7e02419b9523b3c2c0e245740a8307915a45c91b46e3</originalsourceid><addsrcrecordid>eNp9kU1rFTEUhoMotl5duZeAIEIZm6-ZTLoQSrW2UHCj63CSOXObOncyJjMFXfk3_Hv-ElPu7UVduDqB8_Bw3ryEPOfsDWdGHrvgjp0Dz4R4QA650rpSrFYP796NrmrVyAPyJOcbxgTTLX9MDmTNW8kafki-vEOc6ICQxjCuKwcZOxo6HOfQBw9ziCONPV3jiHPw9BZSgHHOJxSmabgH5khPh-_XGDaYfv34mWkXMhYT9QPkvBc9JY96GDI-280V-Xz-_tPZRXX18cPl2elV5ZVicyWkYQZaaIWXnHtQvcBOIxOKG2dqIZ30wjMUqtaKQQmiDa9B1d5wpxqUK_J2650Wt8HOlzAJBjulsIH0zUYI9u_NGK7tOt7a1gjelC9dkdc7QYpfF8yz3YTscRhgxLhkKxrJDDfaiIK-_Ae9iUsaS7xCKdboVte8UEdbyqeYc8J-fwxn9q5EW0q0uxIL_eLP-_fsfWsFeLUF4jL91_Qbgp6n2Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2640678751</pqid></control><display><type>article</type><title>Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification</title><source>Oxford Journals Open Access Collection</source><creator>Jo, Taeho ; Nho, Kwangsik ; Bice, Paula ; Saykin, Andrew J</creator><creatorcontrib>Jo, Taeho ; Nho, Kwangsik ; Bice, Paula ; Saykin, Andrew J ; Alzheimer’s Disease Neuroimaging Initiative ; For The Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><description>Abstract Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) that can be applied to develop accurate disease classification models. In the first step, we divided the whole genome into nonoverlapping fragments of an optimal size and then ran convolutional neural network (CNN) on each fragment to select phenotype-associated fragments. In the second step, using a Sliding Window Association Test (SWAT), we ran CNN on the selected fragments to calculate phenotype influence scores (PIS) and identify phenotype-associated SNPs based on PIS. In the third step, we ran CNN on all identified SNPs to develop a classification model. We tested our approach using GWAS data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) including (N = 981; cognitively normal older adults (CN) = 650 and AD = 331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an area under the curve (AUC) of 0.82, which was compatible with traditional machine learning approaches, random forest and XGBoost. SWAT-CNN, a novel deep learning–based genome-wide approach, identified AD-associated SNPs and a classification model for AD and may hold promise for a range of biomedical applications.</description><identifier>ISSN: 1467-5463</identifier><identifier>ISSN: 1477-4054</identifier><identifier>EISSN: 1477-4054</identifier><identifier>DOI: 10.1093/bib/bbac022</identifier><identifier>PMID: 35183061</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Aged ; Alzheimer Disease - genetics ; Alzheimer's disease ; Apolipoprotein E ; Artificial neural networks ; Biomedical materials ; Classification ; Cognitive Dysfunction - genetics ; Deep Learning ; Feature extraction ; Fragments ; Genetic diversity ; Genetic variance ; Genome-wide association studies ; Genome-Wide Association Study ; Genomes ; Genotype &amp; phenotype ; Humans ; Machine learning ; Magnetic Resonance Imaging - methods ; Medical imaging ; Model testing ; Neural networks ; Neurodegenerative diseases ; Neuroimaging ; Nucleotides ; Older people ; Phenotypes ; Problem Solving Protocol ; Single-nucleotide polymorphism</subject><ispartof>Briefings in bioinformatics, 2022-03, Vol.23 (2)</ispartof><rights>The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2022</rights><rights>The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.</rights><rights>The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c440t-23909a8a82c311ca4f2ed7e02419b9523b3c2c0e245740a8307915a45c91b46e3</citedby><cites>FETCH-LOGICAL-c440t-23909a8a82c311ca4f2ed7e02419b9523b3c2c0e245740a8307915a45c91b46e3</cites><orcidid>0000-0002-7624-3872 ; 0000-0002-1376-8532 ; 0000-0001-5357-9966 ; 0000-0003-1765-5735</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8921609/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8921609/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,724,777,781,882,1599,27905,27906,53772,53774</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bib/bbac022$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35183061$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Jo, Taeho</creatorcontrib><creatorcontrib>Nho, Kwangsik</creatorcontrib><creatorcontrib>Bice, Paula</creatorcontrib><creatorcontrib>Saykin, Andrew J</creatorcontrib><creatorcontrib>Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><creatorcontrib>For The Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><title>Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification</title><title>Briefings in bioinformatics</title><addtitle>Brief Bioinform</addtitle><description>Abstract Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) that can be applied to develop accurate disease classification models. In the first step, we divided the whole genome into nonoverlapping fragments of an optimal size and then ran convolutional neural network (CNN) on each fragment to select phenotype-associated fragments. In the second step, using a Sliding Window Association Test (SWAT), we ran CNN on the selected fragments to calculate phenotype influence scores (PIS) and identify phenotype-associated SNPs based on PIS. In the third step, we ran CNN on all identified SNPs to develop a classification model. We tested our approach using GWAS data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) including (N = 981; cognitively normal older adults (CN) = 650 and AD = 331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an area under the curve (AUC) of 0.82, which was compatible with traditional machine learning approaches, random forest and XGBoost. SWAT-CNN, a novel deep learning–based genome-wide approach, identified AD-associated SNPs and a classification model for AD and may hold promise for a range of biomedical applications.</description><subject>Aged</subject><subject>Alzheimer Disease - genetics</subject><subject>Alzheimer's disease</subject><subject>Apolipoprotein E</subject><subject>Artificial neural networks</subject><subject>Biomedical materials</subject><subject>Classification</subject><subject>Cognitive Dysfunction - genetics</subject><subject>Deep Learning</subject><subject>Feature extraction</subject><subject>Fragments</subject><subject>Genetic diversity</subject><subject>Genetic variance</subject><subject>Genome-wide association studies</subject><subject>Genome-Wide Association Study</subject><subject>Genomes</subject><subject>Genotype &amp; phenotype</subject><subject>Humans</subject><subject>Machine learning</subject><subject>Magnetic Resonance Imaging - methods</subject><subject>Medical imaging</subject><subject>Model testing</subject><subject>Neural networks</subject><subject>Neurodegenerative diseases</subject><subject>Neuroimaging</subject><subject>Nucleotides</subject><subject>Older people</subject><subject>Phenotypes</subject><subject>Problem Solving Protocol</subject><subject>Single-nucleotide polymorphism</subject><issn>1467-5463</issn><issn>1477-4054</issn><issn>1477-4054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kU1rFTEUhoMotl5duZeAIEIZm6-ZTLoQSrW2UHCj63CSOXObOncyJjMFXfk3_Hv-ElPu7UVduDqB8_Bw3ryEPOfsDWdGHrvgjp0Dz4R4QA650rpSrFYP796NrmrVyAPyJOcbxgTTLX9MDmTNW8kafki-vEOc6ICQxjCuKwcZOxo6HOfQBw9ziCONPV3jiHPw9BZSgHHOJxSmabgH5khPh-_XGDaYfv34mWkXMhYT9QPkvBc9JY96GDI-280V-Xz-_tPZRXX18cPl2elV5ZVicyWkYQZaaIWXnHtQvcBOIxOKG2dqIZ30wjMUqtaKQQmiDa9B1d5wpxqUK_J2650Wt8HOlzAJBjulsIH0zUYI9u_NGK7tOt7a1gjelC9dkdc7QYpfF8yz3YTscRhgxLhkKxrJDDfaiIK-_Ae9iUsaS7xCKdboVte8UEdbyqeYc8J-fwxn9q5EW0q0uxIL_eLP-_fsfWsFeLUF4jL91_Qbgp6n2Q</recordid><startdate>20220310</startdate><enddate>20220310</enddate><creator>Jo, Taeho</creator><creator>Nho, Kwangsik</creator><creator>Bice, Paula</creator><creator>Saykin, Andrew J</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-7624-3872</orcidid><orcidid>https://orcid.org/0000-0002-1376-8532</orcidid><orcidid>https://orcid.org/0000-0001-5357-9966</orcidid><orcidid>https://orcid.org/0000-0003-1765-5735</orcidid></search><sort><creationdate>20220310</creationdate><title>Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification</title><author>Jo, Taeho ; Nho, Kwangsik ; Bice, Paula ; Saykin, Andrew J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c440t-23909a8a82c311ca4f2ed7e02419b9523b3c2c0e245740a8307915a45c91b46e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Aged</topic><topic>Alzheimer Disease - genetics</topic><topic>Alzheimer's disease</topic><topic>Apolipoprotein E</topic><topic>Artificial neural networks</topic><topic>Biomedical materials</topic><topic>Classification</topic><topic>Cognitive Dysfunction - genetics</topic><topic>Deep Learning</topic><topic>Feature extraction</topic><topic>Fragments</topic><topic>Genetic diversity</topic><topic>Genetic variance</topic><topic>Genome-wide association studies</topic><topic>Genome-Wide Association Study</topic><topic>Genomes</topic><topic>Genotype &amp; phenotype</topic><topic>Humans</topic><topic>Machine learning</topic><topic>Magnetic Resonance Imaging - methods</topic><topic>Medical imaging</topic><topic>Model testing</topic><topic>Neural networks</topic><topic>Neurodegenerative diseases</topic><topic>Neuroimaging</topic><topic>Nucleotides</topic><topic>Older people</topic><topic>Phenotypes</topic><topic>Problem Solving Protocol</topic><topic>Single-nucleotide polymorphism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jo, Taeho</creatorcontrib><creatorcontrib>Nho, Kwangsik</creatorcontrib><creatorcontrib>Bice, Paula</creatorcontrib><creatorcontrib>Saykin, Andrew J</creatorcontrib><creatorcontrib>Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><creatorcontrib>For The Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Briefings in bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jo, Taeho</au><au>Nho, Kwangsik</au><au>Bice, Paula</au><au>Saykin, Andrew J</au><aucorp>Alzheimer’s Disease Neuroimaging Initiative</aucorp><aucorp>For The Alzheimer’s Disease Neuroimaging Initiative</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification</atitle><jtitle>Briefings in bioinformatics</jtitle><addtitle>Brief Bioinform</addtitle><date>2022-03-10</date><risdate>2022</risdate><volume>23</volume><issue>2</issue><issn>1467-5463</issn><issn>1477-4054</issn><eissn>1477-4054</eissn><abstract>Abstract Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) that can be applied to develop accurate disease classification models. In the first step, we divided the whole genome into nonoverlapping fragments of an optimal size and then ran convolutional neural network (CNN) on each fragment to select phenotype-associated fragments. In the second step, using a Sliding Window Association Test (SWAT), we ran CNN on the selected fragments to calculate phenotype influence scores (PIS) and identify phenotype-associated SNPs based on PIS. In the third step, we ran CNN on all identified SNPs to develop a classification model. We tested our approach using GWAS data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) including (N = 981; cognitively normal older adults (CN) = 650 and AD = 331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an area under the curve (AUC) of 0.82, which was compatible with traditional machine learning approaches, random forest and XGBoost. SWAT-CNN, a novel deep learning–based genome-wide approach, identified AD-associated SNPs and a classification model for AD and may hold promise for a range of biomedical applications.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>35183061</pmid><doi>10.1093/bib/bbac022</doi><orcidid>https://orcid.org/0000-0002-7624-3872</orcidid><orcidid>https://orcid.org/0000-0002-1376-8532</orcidid><orcidid>https://orcid.org/0000-0001-5357-9966</orcidid><orcidid>https://orcid.org/0000-0003-1765-5735</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1467-5463
ispartof Briefings in bioinformatics, 2022-03, Vol.23 (2)
issn 1467-5463
1477-4054
1477-4054
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8921609
source Oxford Journals Open Access Collection
subjects Aged
Alzheimer Disease - genetics
Alzheimer's disease
Apolipoprotein E
Artificial neural networks
Biomedical materials
Classification
Cognitive Dysfunction - genetics
Deep Learning
Feature extraction
Fragments
Genetic diversity
Genetic variance
Genome-wide association studies
Genome-Wide Association Study
Genomes
Genotype & phenotype
Humans
Machine learning
Magnetic Resonance Imaging - methods
Medical imaging
Model testing
Neural networks
Neurodegenerative diseases
Neuroimaging
Nucleotides
Older people
Phenotypes
Problem Solving Protocol
Single-nucleotide polymorphism
title Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T17%3A06%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20learning-based%20identification%20of%20genetic%20variants:%20application%20to%20Alzheimer%E2%80%99s%20disease%20classification&rft.jtitle=Briefings%20in%20bioinformatics&rft.au=Jo,%20Taeho&rft.aucorp=Alzheimer%E2%80%99s%20Disease%20Neuroimaging%20Initiative&rft.date=2022-03-10&rft.volume=23&rft.issue=2&rft.issn=1467-5463&rft.eissn=1477-4054&rft_id=info:doi/10.1093/bib/bbac022&rft_dat=%3Cproquest_TOX%3E2640678751%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2640678751&rft_id=info:pmid/35183061&rft_oup_id=10.1093/bib/bbac022&rfr_iscdi=true