Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions

Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to pred...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BMC proceedings 2012-05, Vol.6 Suppl 2 (Suppl 2), p.S10-S10, Article S10
Hauptverfasser: Ogutu, Joseph O, Schulz-Streeck, Torben, Piepho, Hans-Peter
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page S10
container_issue Suppl 2
container_start_page S10
container_title BMC proceedings
container_volume 6 Suppl 2
creator Ogutu, Joseph O
Schulz-Streeck, Torben
Piepho, Hans-Peter
description Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). The elastic net, lasso, adaptive lasso and the adaptive elastic net all had similar accuracies but outperformed ridge regression and ridge regression BLUP in terms of the Pearson correlation between predicted GEBVs and the true genomic value as well as the root mean squared error. The performance of RR-BLUP was also somewhat better than that of ridge regression. This pattern was replicated by the Pearson correlation between predicted GEBVs and the true breeding values (TBV) and the root mean squared error calculated with respect to TBV, except that accuracy was lower for all
doi_str_mv 10.1186/1753-6561-6-S2-S10
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3363152</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1753470013</sourcerecordid><originalsourceid>FETCH-LOGICAL-b4400-1de7b64b8e04e647d602fda4a65c916c05d0cb0baa3ca8185b702eeadba6444a3</originalsourceid><addsrcrecordid>eNqFkk9v1DAQxS0EoqXwBTggS1w4NHT8J06WQyVUlRapUg8LZ8uxZ7euHLvYCbR8epJuWW0RiNNYM795enoeQl4zeM9Yq45YU4tK1YpVqlryasngCdnfNp_uvPfIi1KuARTUC_6c7HGuJEih9smPM4yp95YWDGgHnyIdi49rmnE9BpP9T3Q0-Igmz62MpcxMnxyG8oFm79a4MzikwZSSDilOdZhkIw7UREeHK_SZ4u2AcebKS_JsZULBVw_1gHz9dPrl5Ly6uDz7fPLxouqkBKiYw6ZTsmsRJCrZOAV85Yw0qrYLpizUDmwHnTHCmpa1ddcARzSuM0pKacQBOd7o3oxdj85iHLIJ-ib73uQ7nYzXjyfRX-l1-q6FUILVfBI43Qh0Pv1D4PHEpl7Pwes5eK30kuvpZyaddw9Gcvo2Yhl074vFEEzENJb7FdkAMPF_FFjTKN6qxYS-_QO9TmOOU6IzNSUAi7qeKL6hbE6lZFxt7TPQ8yn93fCb3eS2K79vR_wCIjnH7A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1014440955</pqid></control><display><type>article</type><title>Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions</title><source>DOAJ Directory of Open Access Journals</source><source>SpringerNature Journals</source><source>PubMed Central Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Ogutu, Joseph O ; Schulz-Streeck, Torben ; Piepho, Hans-Peter</creator><creatorcontrib>Ogutu, Joseph O ; Schulz-Streeck, Torben ; Piepho, Hans-Peter</creatorcontrib><description>Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). The elastic net, lasso, adaptive lasso and the adaptive elastic net all had similar accuracies but outperformed ridge regression and ridge regression BLUP in terms of the Pearson correlation between predicted GEBVs and the true genomic value as well as the root mean squared error. The performance of RR-BLUP was also somewhat better than that of ridge regression. This pattern was replicated by the Pearson correlation between predicted GEBVs and the true breeding values (TBV) and the root mean squared error calculated with respect to TBV, except that accuracy was lower for all models, most especially for the adaptive elastic net. The correlation between the predicted GEBV and simulated phenotypic values based on the fivefold CV also revealed a similar pattern except that the adaptive elastic net had lower accuracy than both the ridge regression methods. All the six models had relatively high prediction accuracies for the simulated data set. Accuracy was higher for the lasso type methods than for ridge regression and ridge regression BLUP.</description><identifier>ISSN: 1753-6561</identifier><identifier>EISSN: 1753-6561</identifier><identifier>DOI: 10.1186/1753-6561-6-S2-S10</identifier><identifier>PMID: 22640436</identifier><language>eng</language><publisher>England: BioMed Central</publisher><subject>Breeding of animals ; Crop science ; Fines &amp; penalties ; Proceedings ; Studies</subject><ispartof>BMC proceedings, 2012-05, Vol.6 Suppl 2 (Suppl 2), p.S10-S10, Article S10</ispartof><rights>2012 Ogutu et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</rights><rights>Copyright ©2012 Ogutu et al.; licensee BioMed Central Ltd. 2012 Ogutu et al.; licensee BioMed Central Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-b4400-1de7b64b8e04e647d602fda4a65c916c05d0cb0baa3ca8185b702eeadba6444a3</citedby><cites>FETCH-LOGICAL-b4400-1de7b64b8e04e647d602fda4a65c916c05d0cb0baa3ca8185b702eeadba6444a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3363152/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3363152/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,315,728,781,785,865,886,27929,27930,53796,53798</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/22640436$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ogutu, Joseph O</creatorcontrib><creatorcontrib>Schulz-Streeck, Torben</creatorcontrib><creatorcontrib>Piepho, Hans-Peter</creatorcontrib><title>Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions</title><title>BMC proceedings</title><addtitle>BMC Proc</addtitle><description>Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). The elastic net, lasso, adaptive lasso and the adaptive elastic net all had similar accuracies but outperformed ridge regression and ridge regression BLUP in terms of the Pearson correlation between predicted GEBVs and the true genomic value as well as the root mean squared error. The performance of RR-BLUP was also somewhat better than that of ridge regression. This pattern was replicated by the Pearson correlation between predicted GEBVs and the true breeding values (TBV) and the root mean squared error calculated with respect to TBV, except that accuracy was lower for all models, most especially for the adaptive elastic net. The correlation between the predicted GEBV and simulated phenotypic values based on the fivefold CV also revealed a similar pattern except that the adaptive elastic net had lower accuracy than both the ridge regression methods. All the six models had relatively high prediction accuracies for the simulated data set. Accuracy was higher for the lasso type methods than for ridge regression and ridge regression BLUP.</description><subject>Breeding of animals</subject><subject>Crop science</subject><subject>Fines &amp; penalties</subject><subject>Proceedings</subject><subject>Studies</subject><issn>1753-6561</issn><issn>1753-6561</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqFkk9v1DAQxS0EoqXwBTggS1w4NHT8J06WQyVUlRapUg8LZ8uxZ7euHLvYCbR8epJuWW0RiNNYM795enoeQl4zeM9Yq45YU4tK1YpVqlryasngCdnfNp_uvPfIi1KuARTUC_6c7HGuJEih9smPM4yp95YWDGgHnyIdi49rmnE9BpP9T3Q0-Igmz62MpcxMnxyG8oFm79a4MzikwZSSDilOdZhkIw7UREeHK_SZ4u2AcebKS_JsZULBVw_1gHz9dPrl5Ly6uDz7fPLxouqkBKiYw6ZTsmsRJCrZOAV85Yw0qrYLpizUDmwHnTHCmpa1ddcARzSuM0pKacQBOd7o3oxdj85iHLIJ-ib73uQ7nYzXjyfRX-l1-q6FUILVfBI43Qh0Pv1D4PHEpl7Pwes5eK30kuvpZyaddw9Gcvo2Yhl074vFEEzENJb7FdkAMPF_FFjTKN6qxYS-_QO9TmOOU6IzNSUAi7qeKL6hbE6lZFxt7TPQ8yn93fCb3eS2K79vR_wCIjnH7A</recordid><startdate>20120521</startdate><enddate>20120521</enddate><creator>Ogutu, Joseph O</creator><creator>Schulz-Streeck, Torben</creator><creator>Piepho, Hans-Peter</creator><general>BioMed Central</general><general>BioMed Central Ltd</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>8FE</scope><scope>8FG</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>L6V</scope><scope>M0S</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>7X8</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>5PM</scope></search><sort><creationdate>20120521</creationdate><title>Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions</title><author>Ogutu, Joseph O ; Schulz-Streeck, Torben ; Piepho, Hans-Peter</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-b4400-1de7b64b8e04e647d602fda4a65c916c05d0cb0baa3ca8185b702eeadba6444a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Breeding of animals</topic><topic>Crop science</topic><topic>Fines &amp; penalties</topic><topic>Proceedings</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ogutu, Joseph O</creatorcontrib><creatorcontrib>Schulz-Streeck, Torben</creatorcontrib><creatorcontrib>Piepho, Hans-Peter</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Materials Science Collection</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>MEDLINE - Academic</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BMC proceedings</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ogutu, Joseph O</au><au>Schulz-Streeck, Torben</au><au>Piepho, Hans-Peter</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions</atitle><jtitle>BMC proceedings</jtitle><addtitle>BMC Proc</addtitle><date>2012-05-21</date><risdate>2012</risdate><volume>6 Suppl 2</volume><issue>Suppl 2</issue><spage>S10</spage><epage>S10</epage><pages>S10-S10</pages><artnum>S10</artnum><issn>1753-6561</issn><eissn>1753-6561</eissn><abstract>Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). The elastic net, lasso, adaptive lasso and the adaptive elastic net all had similar accuracies but outperformed ridge regression and ridge regression BLUP in terms of the Pearson correlation between predicted GEBVs and the true genomic value as well as the root mean squared error. The performance of RR-BLUP was also somewhat better than that of ridge regression. This pattern was replicated by the Pearson correlation between predicted GEBVs and the true breeding values (TBV) and the root mean squared error calculated with respect to TBV, except that accuracy was lower for all models, most especially for the adaptive elastic net. The correlation between the predicted GEBV and simulated phenotypic values based on the fivefold CV also revealed a similar pattern except that the adaptive elastic net had lower accuracy than both the ridge regression methods. All the six models had relatively high prediction accuracies for the simulated data set. Accuracy was higher for the lasso type methods than for ridge regression and ridge regression BLUP.</abstract><cop>England</cop><pub>BioMed Central</pub><pmid>22640436</pmid><doi>10.1186/1753-6561-6-S2-S10</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1753-6561
ispartof BMC proceedings, 2012-05, Vol.6 Suppl 2 (Suppl 2), p.S10-S10, Article S10
issn 1753-6561
1753-6561
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3363152
source DOAJ Directory of Open Access Journals; SpringerNature Journals; PubMed Central Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Breeding of animals
Crop science
Fines & penalties
Proceedings
Studies
title Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-15T01%3A53%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Genomic%20selection%20using%20regularized%20linear%20regression%20models:%20ridge%20regression,%20lasso,%20elastic%20net%20and%20their%20extensions&rft.jtitle=BMC%20proceedings&rft.au=Ogutu,%20Joseph%20O&rft.date=2012-05-21&rft.volume=6%20Suppl%202&rft.issue=Suppl%202&rft.spage=S10&rft.epage=S10&rft.pages=S10-S10&rft.artnum=S10&rft.issn=1753-6561&rft.eissn=1753-6561&rft_id=info:doi/10.1186/1753-6561-6-S2-S10&rft_dat=%3Cproquest_pubme%3E1753470013%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1014440955&rft_id=info:pmid/22640436&rfr_iscdi=true