Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification
Abstract Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identif...
Gespeichert in:
Veröffentlicht in: | Briefings in bioinformatics 2022-03, Vol.23 (2) |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 2 |
container_start_page | |
container_title | Briefings in bioinformatics |
container_volume | 23 |
creator | Jo, Taeho Nho, Kwangsik Bice, Paula Saykin, Andrew J |
description | Abstract
Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) that can be applied to develop accurate disease classification models. In the first step, we divided the whole genome into nonoverlapping fragments of an optimal size and then ran convolutional neural network (CNN) on each fragment to select phenotype-associated fragments. In the second step, using a Sliding Window Association Test (SWAT), we ran CNN on the selected fragments to calculate phenotype influence scores (PIS) and identify phenotype-associated SNPs based on PIS. In the third step, we ran CNN on all identified SNPs to develop a classification model. We tested our approach using GWAS data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) including (N = 981; cognitively normal older adults (CN) = 650 and AD = 331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an area under the curve (AUC) of 0.82, which was compatible with traditional machine learning approaches, random forest and XGBoost. SWAT-CNN, a novel deep learning–based genome-wide approach, identified AD-associated SNPs and a classification model for AD and may hold promise for a range of biomedical applications. |
doi_str_mv | 10.1093/bib/bbac022 |
format | Article |
fullrecord | <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8921609</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bib/bbac022</oup_id><sourcerecordid>2640678751</sourcerecordid><originalsourceid>FETCH-LOGICAL-c440t-23909a8a82c311ca4f2ed7e02419b9523b3c2c0e245740a8307915a45c91b46e3</originalsourceid><addsrcrecordid>eNp9kU1rFTEUhoMotl5duZeAIEIZm6-ZTLoQSrW2UHCj63CSOXObOncyJjMFXfk3_Hv-ElPu7UVduDqB8_Bw3ryEPOfsDWdGHrvgjp0Dz4R4QA650rpSrFYP796NrmrVyAPyJOcbxgTTLX9MDmTNW8kafki-vEOc6ICQxjCuKwcZOxo6HOfQBw9ziCONPV3jiHPw9BZSgHHOJxSmabgH5khPh-_XGDaYfv34mWkXMhYT9QPkvBc9JY96GDI-280V-Xz-_tPZRXX18cPl2elV5ZVicyWkYQZaaIWXnHtQvcBOIxOKG2dqIZ30wjMUqtaKQQmiDa9B1d5wpxqUK_J2650Wt8HOlzAJBjulsIH0zUYI9u_NGK7tOt7a1gjelC9dkdc7QYpfF8yz3YTscRhgxLhkKxrJDDfaiIK-_Ae9iUsaS7xCKdboVte8UEdbyqeYc8J-fwxn9q5EW0q0uxIL_eLP-_fsfWsFeLUF4jL91_Qbgp6n2Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2640678751</pqid></control><display><type>article</type><title>Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification</title><source>Oxford Journals Open Access Collection</source><creator>Jo, Taeho ; Nho, Kwangsik ; Bice, Paula ; Saykin, Andrew J</creator><creatorcontrib>Jo, Taeho ; Nho, Kwangsik ; Bice, Paula ; Saykin, Andrew J ; Alzheimer’s Disease Neuroimaging Initiative ; For The Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><description>Abstract
Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) that can be applied to develop accurate disease classification models. In the first step, we divided the whole genome into nonoverlapping fragments of an optimal size and then ran convolutional neural network (CNN) on each fragment to select phenotype-associated fragments. In the second step, using a Sliding Window Association Test (SWAT), we ran CNN on the selected fragments to calculate phenotype influence scores (PIS) and identify phenotype-associated SNPs based on PIS. In the third step, we ran CNN on all identified SNPs to develop a classification model. We tested our approach using GWAS data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) including (N = 981; cognitively normal older adults (CN) = 650 and AD = 331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an area under the curve (AUC) of 0.82, which was compatible with traditional machine learning approaches, random forest and XGBoost. SWAT-CNN, a novel deep learning–based genome-wide approach, identified AD-associated SNPs and a classification model for AD and may hold promise for a range of biomedical applications.</description><identifier>ISSN: 1467-5463</identifier><identifier>ISSN: 1477-4054</identifier><identifier>EISSN: 1477-4054</identifier><identifier>DOI: 10.1093/bib/bbac022</identifier><identifier>PMID: 35183061</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Aged ; Alzheimer Disease - genetics ; Alzheimer's disease ; Apolipoprotein E ; Artificial neural networks ; Biomedical materials ; Classification ; Cognitive Dysfunction - genetics ; Deep Learning ; Feature extraction ; Fragments ; Genetic diversity ; Genetic variance ; Genome-wide association studies ; Genome-Wide Association Study ; Genomes ; Genotype & phenotype ; Humans ; Machine learning ; Magnetic Resonance Imaging - methods ; Medical imaging ; Model testing ; Neural networks ; Neurodegenerative diseases ; Neuroimaging ; Nucleotides ; Older people ; Phenotypes ; Problem Solving Protocol ; Single-nucleotide polymorphism</subject><ispartof>Briefings in bioinformatics, 2022-03, Vol.23 (2)</ispartof><rights>The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2022</rights><rights>The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.</rights><rights>The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c440t-23909a8a82c311ca4f2ed7e02419b9523b3c2c0e245740a8307915a45c91b46e3</citedby><cites>FETCH-LOGICAL-c440t-23909a8a82c311ca4f2ed7e02419b9523b3c2c0e245740a8307915a45c91b46e3</cites><orcidid>0000-0002-7624-3872 ; 0000-0002-1376-8532 ; 0000-0001-5357-9966 ; 0000-0003-1765-5735</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8921609/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8921609/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,724,777,781,882,1599,27905,27906,53772,53774</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bib/bbac022$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35183061$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Jo, Taeho</creatorcontrib><creatorcontrib>Nho, Kwangsik</creatorcontrib><creatorcontrib>Bice, Paula</creatorcontrib><creatorcontrib>Saykin, Andrew J</creatorcontrib><creatorcontrib>Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><creatorcontrib>For The Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><title>Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification</title><title>Briefings in bioinformatics</title><addtitle>Brief Bioinform</addtitle><description>Abstract
Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) that can be applied to develop accurate disease classification models. In the first step, we divided the whole genome into nonoverlapping fragments of an optimal size and then ran convolutional neural network (CNN) on each fragment to select phenotype-associated fragments. In the second step, using a Sliding Window Association Test (SWAT), we ran CNN on the selected fragments to calculate phenotype influence scores (PIS) and identify phenotype-associated SNPs based on PIS. In the third step, we ran CNN on all identified SNPs to develop a classification model. We tested our approach using GWAS data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) including (N = 981; cognitively normal older adults (CN) = 650 and AD = 331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an area under the curve (AUC) of 0.82, which was compatible with traditional machine learning approaches, random forest and XGBoost. SWAT-CNN, a novel deep learning–based genome-wide approach, identified AD-associated SNPs and a classification model for AD and may hold promise for a range of biomedical applications.</description><subject>Aged</subject><subject>Alzheimer Disease - genetics</subject><subject>Alzheimer's disease</subject><subject>Apolipoprotein E</subject><subject>Artificial neural networks</subject><subject>Biomedical materials</subject><subject>Classification</subject><subject>Cognitive Dysfunction - genetics</subject><subject>Deep Learning</subject><subject>Feature extraction</subject><subject>Fragments</subject><subject>Genetic diversity</subject><subject>Genetic variance</subject><subject>Genome-wide association studies</subject><subject>Genome-Wide Association Study</subject><subject>Genomes</subject><subject>Genotype & phenotype</subject><subject>Humans</subject><subject>Machine learning</subject><subject>Magnetic Resonance Imaging - methods</subject><subject>Medical imaging</subject><subject>Model testing</subject><subject>Neural networks</subject><subject>Neurodegenerative diseases</subject><subject>Neuroimaging</subject><subject>Nucleotides</subject><subject>Older people</subject><subject>Phenotypes</subject><subject>Problem Solving Protocol</subject><subject>Single-nucleotide polymorphism</subject><issn>1467-5463</issn><issn>1477-4054</issn><issn>1477-4054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kU1rFTEUhoMotl5duZeAIEIZm6-ZTLoQSrW2UHCj63CSOXObOncyJjMFXfk3_Hv-ElPu7UVduDqB8_Bw3ryEPOfsDWdGHrvgjp0Dz4R4QA650rpSrFYP796NrmrVyAPyJOcbxgTTLX9MDmTNW8kafki-vEOc6ICQxjCuKwcZOxo6HOfQBw9ziCONPV3jiHPw9BZSgHHOJxSmabgH5khPh-_XGDaYfv34mWkXMhYT9QPkvBc9JY96GDI-280V-Xz-_tPZRXX18cPl2elV5ZVicyWkYQZaaIWXnHtQvcBOIxOKG2dqIZ30wjMUqtaKQQmiDa9B1d5wpxqUK_J2650Wt8HOlzAJBjulsIH0zUYI9u_NGK7tOt7a1gjelC9dkdc7QYpfF8yz3YTscRhgxLhkKxrJDDfaiIK-_Ae9iUsaS7xCKdboVte8UEdbyqeYc8J-fwxn9q5EW0q0uxIL_eLP-_fsfWsFeLUF4jL91_Qbgp6n2Q</recordid><startdate>20220310</startdate><enddate>20220310</enddate><creator>Jo, Taeho</creator><creator>Nho, Kwangsik</creator><creator>Bice, Paula</creator><creator>Saykin, Andrew J</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-7624-3872</orcidid><orcidid>https://orcid.org/0000-0002-1376-8532</orcidid><orcidid>https://orcid.org/0000-0001-5357-9966</orcidid><orcidid>https://orcid.org/0000-0003-1765-5735</orcidid></search><sort><creationdate>20220310</creationdate><title>Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification</title><author>Jo, Taeho ; Nho, Kwangsik ; Bice, Paula ; Saykin, Andrew J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c440t-23909a8a82c311ca4f2ed7e02419b9523b3c2c0e245740a8307915a45c91b46e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Aged</topic><topic>Alzheimer Disease - genetics</topic><topic>Alzheimer's disease</topic><topic>Apolipoprotein E</topic><topic>Artificial neural networks</topic><topic>Biomedical materials</topic><topic>Classification</topic><topic>Cognitive Dysfunction - genetics</topic><topic>Deep Learning</topic><topic>Feature extraction</topic><topic>Fragments</topic><topic>Genetic diversity</topic><topic>Genetic variance</topic><topic>Genome-wide association studies</topic><topic>Genome-Wide Association Study</topic><topic>Genomes</topic><topic>Genotype & phenotype</topic><topic>Humans</topic><topic>Machine learning</topic><topic>Magnetic Resonance Imaging - methods</topic><topic>Medical imaging</topic><topic>Model testing</topic><topic>Neural networks</topic><topic>Neurodegenerative diseases</topic><topic>Neuroimaging</topic><topic>Nucleotides</topic><topic>Older people</topic><topic>Phenotypes</topic><topic>Problem Solving Protocol</topic><topic>Single-nucleotide polymorphism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jo, Taeho</creatorcontrib><creatorcontrib>Nho, Kwangsik</creatorcontrib><creatorcontrib>Bice, Paula</creatorcontrib><creatorcontrib>Saykin, Andrew J</creatorcontrib><creatorcontrib>Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><creatorcontrib>For The Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Briefings in bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jo, Taeho</au><au>Nho, Kwangsik</au><au>Bice, Paula</au><au>Saykin, Andrew J</au><aucorp>Alzheimer’s Disease Neuroimaging Initiative</aucorp><aucorp>For The Alzheimer’s Disease Neuroimaging Initiative</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification</atitle><jtitle>Briefings in bioinformatics</jtitle><addtitle>Brief Bioinform</addtitle><date>2022-03-10</date><risdate>2022</risdate><volume>23</volume><issue>2</issue><issn>1467-5463</issn><issn>1477-4054</issn><eissn>1477-4054</eissn><abstract>Abstract
Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) that can be applied to develop accurate disease classification models. In the first step, we divided the whole genome into nonoverlapping fragments of an optimal size and then ran convolutional neural network (CNN) on each fragment to select phenotype-associated fragments. In the second step, using a Sliding Window Association Test (SWAT), we ran CNN on the selected fragments to calculate phenotype influence scores (PIS) and identify phenotype-associated SNPs based on PIS. In the third step, we ran CNN on all identified SNPs to develop a classification model. We tested our approach using GWAS data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) including (N = 981; cognitively normal older adults (CN) = 650 and AD = 331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an area under the curve (AUC) of 0.82, which was compatible with traditional machine learning approaches, random forest and XGBoost. SWAT-CNN, a novel deep learning–based genome-wide approach, identified AD-associated SNPs and a classification model for AD and may hold promise for a range of biomedical applications.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>35183061</pmid><doi>10.1093/bib/bbac022</doi><orcidid>https://orcid.org/0000-0002-7624-3872</orcidid><orcidid>https://orcid.org/0000-0002-1376-8532</orcidid><orcidid>https://orcid.org/0000-0001-5357-9966</orcidid><orcidid>https://orcid.org/0000-0003-1765-5735</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1467-5463 |
ispartof | Briefings in bioinformatics, 2022-03, Vol.23 (2) |
issn | 1467-5463 1477-4054 1477-4054 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8921609 |
source | Oxford Journals Open Access Collection |
subjects | Aged Alzheimer Disease - genetics Alzheimer's disease Apolipoprotein E Artificial neural networks Biomedical materials Classification Cognitive Dysfunction - genetics Deep Learning Feature extraction Fragments Genetic diversity Genetic variance Genome-wide association studies Genome-Wide Association Study Genomes Genotype & phenotype Humans Machine learning Magnetic Resonance Imaging - methods Medical imaging Model testing Neural networks Neurodegenerative diseases Neuroimaging Nucleotides Older people Phenotypes Problem Solving Protocol Single-nucleotide polymorphism |
title | Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T17%3A06%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20learning-based%20identification%20of%20genetic%20variants:%20application%20to%20Alzheimer%E2%80%99s%20disease%20classification&rft.jtitle=Briefings%20in%20bioinformatics&rft.au=Jo,%20Taeho&rft.aucorp=Alzheimer%E2%80%99s%20Disease%20Neuroimaging%20Initiative&rft.date=2022-03-10&rft.volume=23&rft.issue=2&rft.issn=1467-5463&rft.eissn=1477-4054&rft_id=info:doi/10.1093/bib/bbac022&rft_dat=%3Cproquest_TOX%3E2640678751%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2640678751&rft_id=info:pmid/35183061&rft_oup_id=10.1093/bib/bbac022&rfr_iscdi=true |