Sub-sampling graph neural networks for genomic prediction of quantitative phenotypes

Abstract In genomics, use of deep learning (DL) is rapidly growing and DL has successfully demonstrated its ability to uncover complex relationships in large biological and biomedical data sets. With the development of high-throughput sequencing techniques, genomic markers can now be allocated to la...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:G3 : genes - genomes - genetics 2024-11, Vol.14 (11)
Hauptverfasser: Kihlman, Ragini, Launonen, Ilkka, Sillanpää, Mikko J, Waldmann, Patrik
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 11
container_start_page
container_title G3 : genes - genomes - genetics
container_volume 14
creator Kihlman, Ragini
Launonen, Ilkka
Sillanpää, Mikko J
Waldmann, Patrik
description Abstract In genomics, use of deep learning (DL) is rapidly growing and DL has successfully demonstrated its ability to uncover complex relationships in large biological and biomedical data sets. With the development of high-throughput sequencing techniques, genomic markers can now be allocated to large sections of a genome. By analyzing allele sharing between individuals, one may calculate realized genomic relationships from single-nucleotide polymorphisms (SNPs) data rather than relying on known pedigree relationships under polygenic model. The traditional approaches in genome-wide prediction (GWP) of quantitative phenotypes utilize genomic relationships in fixed global covariance modeling, possibly with some nonlinear kernel mapping (for example Gaussian processes). On the other hand, the DL approaches proposed so far for GWP fail to take into account the non-Euclidean graph structure of relationships between individuals over several generations. In this paper, we propose one global convolutional neural network (GCN) and one local sub-sampling architecture (GCN-RS) that are specifically designed to perform regression analysis based on genomic relationship information. A GCN is tailored to non-Euclidean spaces and consists of several layers of graph convolutions. The GCN-RS architecture is designed to further improve the GCN’s performance by sub-sampling the graph to reduce the dimensionality of the input data. Through these graph convolutional layers, the GCN maps input genomic markers to their quantitative phenotype values. The graphs are constructed using an iterative nearest neighbor approach. Comparisons show that the GCN-RS outperforms the popular Genomic Best Linear Unbiased Predictor method on one simulated and three real datasets from wheat, mice and pig with a predictive improvement of 4.4% to 49.4% in terms of test mean squared error. This indicates that GCN-RS is a promising tool for genomic predictions in plants and animals. Furthermore, GCN-RS is computationally efficient, making it a viable option for large-scale applications.
doi_str_mv 10.1093/g3journal/jkae216
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11540326</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/g3journal/jkae216</oup_id><sourcerecordid>3102470854</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-eaf383452d384b0a32ca53bc94d431ab73990b3d80d38d7f5d9f2ff0fecdaf3b3</originalsourceid><addsrcrecordid>eNqNkUtPAyEUhYnR2Kb2B7gxs3ThWBiGeayMaXwlTVxY14RhYEqdgSkwNf33Ylob3cmCS3K_c7i5B4BLBG8RLPGswWszWM3a2fqDiQRlJ2AcbhijAmenv94jMHVuDcMhJMvS7ByMcJkQmJN8DJZvQxU71vWt0k3UWNavIi0Gy9pQ_KexHy6SxkaN0KZTPOqtqBX3yujIyGgzMO2VZ15tRdSvAuN3vXAX4Eyy1onpoU7A--PDcv4cL16fXub3i5hjVPpYMIkLnJKkxkVaQYYTzgiueJnWKUasynFZwgrXBQxAnUtSlzKREkrB6yCt8ATc7X37oepEzYX2YXDaW9Uxu6OGKfq3o9WKNmZLESIpxEkWHK4PDtZsBuE87ZTjom2ZFmZwFCOYpDksSBpQtEe5Nc5ZIY__IEi_E6HHROghkaC5-j3gUfGz_wDc7AEz9P_w-wL6Up1N</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3102470854</pqid></control><display><type>article</type><title>Sub-sampling graph neural networks for genomic prediction of quantitative phenotypes</title><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Oxford Journals Open Access Collection</source><source>PubMed Central</source><creator>Kihlman, Ragini ; Launonen, Ilkka ; Sillanpää, Mikko J ; Waldmann, Patrik</creator><contributor>de Koning, D-J</contributor><creatorcontrib>Kihlman, Ragini ; Launonen, Ilkka ; Sillanpää, Mikko J ; Waldmann, Patrik ; de Koning, D-J</creatorcontrib><description>Abstract In genomics, use of deep learning (DL) is rapidly growing and DL has successfully demonstrated its ability to uncover complex relationships in large biological and biomedical data sets. With the development of high-throughput sequencing techniques, genomic markers can now be allocated to large sections of a genome. By analyzing allele sharing between individuals, one may calculate realized genomic relationships from single-nucleotide polymorphisms (SNPs) data rather than relying on known pedigree relationships under polygenic model. The traditional approaches in genome-wide prediction (GWP) of quantitative phenotypes utilize genomic relationships in fixed global covariance modeling, possibly with some nonlinear kernel mapping (for example Gaussian processes). On the other hand, the DL approaches proposed so far for GWP fail to take into account the non-Euclidean graph structure of relationships between individuals over several generations. In this paper, we propose one global convolutional neural network (GCN) and one local sub-sampling architecture (GCN-RS) that are specifically designed to perform regression analysis based on genomic relationship information. A GCN is tailored to non-Euclidean spaces and consists of several layers of graph convolutions. The GCN-RS architecture is designed to further improve the GCN’s performance by sub-sampling the graph to reduce the dimensionality of the input data. Through these graph convolutional layers, the GCN maps input genomic markers to their quantitative phenotype values. The graphs are constructed using an iterative nearest neighbor approach. Comparisons show that the GCN-RS outperforms the popular Genomic Best Linear Unbiased Predictor method on one simulated and three real datasets from wheat, mice and pig with a predictive improvement of 4.4% to 49.4% in terms of test mean squared error. This indicates that GCN-RS is a promising tool for genomic predictions in plants and animals. Furthermore, GCN-RS is computationally efficient, making it a viable option for large-scale applications.</description><identifier>ISSN: 2160-1836</identifier><identifier>EISSN: 2160-1836</identifier><identifier>DOI: 10.1093/g3journal/jkae216</identifier><identifier>PMID: 39250757</identifier><language>eng</language><publisher>US: Oxford University Press</publisher><subject>Algorithms ; Animals ; Genome-Wide Association Study - methods ; Genomic Prediction ; Genomics - methods ; Models, Genetic ; Neural Networks, Computer ; Phenotype ; Polymorphism, Single Nucleotide ; Quantitative Trait Loci ; Triticum - genetics</subject><ispartof>G3 : genes - genomes - genetics, 2024-11, Vol.14 (11)</ispartof><rights>The Author(s) 2024. Published by Oxford University Press on behalf of The Genetics Society of America. 2024</rights><rights>The Author(s) 2024. Published by Oxford University Press on behalf of The Genetics Society of America.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c319t-eaf383452d384b0a32ca53bc94d431ab73990b3d80d38d7f5d9f2ff0fecdaf3b3</cites><orcidid>0000-0003-2390-6609 ; 0000-0003-2808-2768</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11540326/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11540326/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,1603,27923,27924,53790,53792</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39250757$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>de Koning, D-J</contributor><creatorcontrib>Kihlman, Ragini</creatorcontrib><creatorcontrib>Launonen, Ilkka</creatorcontrib><creatorcontrib>Sillanpää, Mikko J</creatorcontrib><creatorcontrib>Waldmann, Patrik</creatorcontrib><title>Sub-sampling graph neural networks for genomic prediction of quantitative phenotypes</title><title>G3 : genes - genomes - genetics</title><addtitle>G3 (Bethesda)</addtitle><description>Abstract In genomics, use of deep learning (DL) is rapidly growing and DL has successfully demonstrated its ability to uncover complex relationships in large biological and biomedical data sets. With the development of high-throughput sequencing techniques, genomic markers can now be allocated to large sections of a genome. By analyzing allele sharing between individuals, one may calculate realized genomic relationships from single-nucleotide polymorphisms (SNPs) data rather than relying on known pedigree relationships under polygenic model. The traditional approaches in genome-wide prediction (GWP) of quantitative phenotypes utilize genomic relationships in fixed global covariance modeling, possibly with some nonlinear kernel mapping (for example Gaussian processes). On the other hand, the DL approaches proposed so far for GWP fail to take into account the non-Euclidean graph structure of relationships between individuals over several generations. In this paper, we propose one global convolutional neural network (GCN) and one local sub-sampling architecture (GCN-RS) that are specifically designed to perform regression analysis based on genomic relationship information. A GCN is tailored to non-Euclidean spaces and consists of several layers of graph convolutions. The GCN-RS architecture is designed to further improve the GCN’s performance by sub-sampling the graph to reduce the dimensionality of the input data. Through these graph convolutional layers, the GCN maps input genomic markers to their quantitative phenotype values. The graphs are constructed using an iterative nearest neighbor approach. Comparisons show that the GCN-RS outperforms the popular Genomic Best Linear Unbiased Predictor method on one simulated and three real datasets from wheat, mice and pig with a predictive improvement of 4.4% to 49.4% in terms of test mean squared error. This indicates that GCN-RS is a promising tool for genomic predictions in plants and animals. Furthermore, GCN-RS is computationally efficient, making it a viable option for large-scale applications.</description><subject>Algorithms</subject><subject>Animals</subject><subject>Genome-Wide Association Study - methods</subject><subject>Genomic Prediction</subject><subject>Genomics - methods</subject><subject>Models, Genetic</subject><subject>Neural Networks, Computer</subject><subject>Phenotype</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Quantitative Trait Loci</subject><subject>Triticum - genetics</subject><issn>2160-1836</issn><issn>2160-1836</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNqNkUtPAyEUhYnR2Kb2B7gxs3ThWBiGeayMaXwlTVxY14RhYEqdgSkwNf33Ylob3cmCS3K_c7i5B4BLBG8RLPGswWszWM3a2fqDiQRlJ2AcbhijAmenv94jMHVuDcMhJMvS7ByMcJkQmJN8DJZvQxU71vWt0k3UWNavIi0Gy9pQ_KexHy6SxkaN0KZTPOqtqBX3yujIyGgzMO2VZ15tRdSvAuN3vXAX4Eyy1onpoU7A--PDcv4cL16fXub3i5hjVPpYMIkLnJKkxkVaQYYTzgiueJnWKUasynFZwgrXBQxAnUtSlzKREkrB6yCt8ATc7X37oepEzYX2YXDaW9Uxu6OGKfq3o9WKNmZLESIpxEkWHK4PDtZsBuE87ZTjom2ZFmZwFCOYpDksSBpQtEe5Nc5ZIY__IEi_E6HHROghkaC5-j3gUfGz_wDc7AEz9P_w-wL6Up1N</recordid><startdate>20241106</startdate><enddate>20241106</enddate><creator>Kihlman, Ragini</creator><creator>Launonen, Ilkka</creator><creator>Sillanpää, Mikko J</creator><creator>Waldmann, Patrik</creator><general>Oxford University Press</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-2390-6609</orcidid><orcidid>https://orcid.org/0000-0003-2808-2768</orcidid></search><sort><creationdate>20241106</creationdate><title>Sub-sampling graph neural networks for genomic prediction of quantitative phenotypes</title><author>Kihlman, Ragini ; Launonen, Ilkka ; Sillanpää, Mikko J ; Waldmann, Patrik</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-eaf383452d384b0a32ca53bc94d431ab73990b3d80d38d7f5d9f2ff0fecdaf3b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Animals</topic><topic>Genome-Wide Association Study - methods</topic><topic>Genomic Prediction</topic><topic>Genomics - methods</topic><topic>Models, Genetic</topic><topic>Neural Networks, Computer</topic><topic>Phenotype</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Quantitative Trait Loci</topic><topic>Triticum - genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kihlman, Ragini</creatorcontrib><creatorcontrib>Launonen, Ilkka</creatorcontrib><creatorcontrib>Sillanpää, Mikko J</creatorcontrib><creatorcontrib>Waldmann, Patrik</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>G3 : genes - genomes - genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kihlman, Ragini</au><au>Launonen, Ilkka</au><au>Sillanpää, Mikko J</au><au>Waldmann, Patrik</au><au>de Koning, D-J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sub-sampling graph neural networks for genomic prediction of quantitative phenotypes</atitle><jtitle>G3 : genes - genomes - genetics</jtitle><addtitle>G3 (Bethesda)</addtitle><date>2024-11-06</date><risdate>2024</risdate><volume>14</volume><issue>11</issue><issn>2160-1836</issn><eissn>2160-1836</eissn><abstract>Abstract In genomics, use of deep learning (DL) is rapidly growing and DL has successfully demonstrated its ability to uncover complex relationships in large biological and biomedical data sets. With the development of high-throughput sequencing techniques, genomic markers can now be allocated to large sections of a genome. By analyzing allele sharing between individuals, one may calculate realized genomic relationships from single-nucleotide polymorphisms (SNPs) data rather than relying on known pedigree relationships under polygenic model. The traditional approaches in genome-wide prediction (GWP) of quantitative phenotypes utilize genomic relationships in fixed global covariance modeling, possibly with some nonlinear kernel mapping (for example Gaussian processes). On the other hand, the DL approaches proposed so far for GWP fail to take into account the non-Euclidean graph structure of relationships between individuals over several generations. In this paper, we propose one global convolutional neural network (GCN) and one local sub-sampling architecture (GCN-RS) that are specifically designed to perform regression analysis based on genomic relationship information. A GCN is tailored to non-Euclidean spaces and consists of several layers of graph convolutions. The GCN-RS architecture is designed to further improve the GCN’s performance by sub-sampling the graph to reduce the dimensionality of the input data. Through these graph convolutional layers, the GCN maps input genomic markers to their quantitative phenotype values. The graphs are constructed using an iterative nearest neighbor approach. Comparisons show that the GCN-RS outperforms the popular Genomic Best Linear Unbiased Predictor method on one simulated and three real datasets from wheat, mice and pig with a predictive improvement of 4.4% to 49.4% in terms of test mean squared error. This indicates that GCN-RS is a promising tool for genomic predictions in plants and animals. Furthermore, GCN-RS is computationally efficient, making it a viable option for large-scale applications.</abstract><cop>US</cop><pub>Oxford University Press</pub><pmid>39250757</pmid><doi>10.1093/g3journal/jkae216</doi><orcidid>https://orcid.org/0000-0003-2390-6609</orcidid><orcidid>https://orcid.org/0000-0003-2808-2768</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2160-1836
ispartof G3 : genes - genomes - genetics, 2024-11, Vol.14 (11)
issn 2160-1836
2160-1836
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11540326
source MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Oxford Journals Open Access Collection; PubMed Central
subjects Algorithms
Animals
Genome-Wide Association Study - methods
Genomic Prediction
Genomics - methods
Models, Genetic
Neural Networks, Computer
Phenotype
Polymorphism, Single Nucleotide
Quantitative Trait Loci
Triticum - genetics
title Sub-sampling graph neural networks for genomic prediction of quantitative phenotypes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T16%3A54%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sub-sampling%20graph%20neural%20networks%20for%20genomic%20prediction%20of%20quantitative%20phenotypes&rft.jtitle=G3%20:%20genes%20-%20genomes%20-%20genetics&rft.au=Kihlman,%20Ragini&rft.date=2024-11-06&rft.volume=14&rft.issue=11&rft.issn=2160-1836&rft.eissn=2160-1836&rft_id=info:doi/10.1093/g3journal/jkae216&rft_dat=%3Cproquest_pubme%3E3102470854%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3102470854&rft_id=info:pmid/39250757&rft_oup_id=10.1093/g3journal/jkae216&rfr_iscdi=true