Evaluation of polygenic prediction methodology within a reference-standardized framework
The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of var...
Gespeichert in:
Veröffentlicht in: | PLoS genetics 2021-05, Vol.17 (5), p.e1009021 |
---|---|
Hauptverfasser: | , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 5 |
container_start_page | e1009021 |
container_title | PLoS genetics |
container_volume | 17 |
creator | Pain, Oliver Glanville, Kylie P Hagenaars, Saskia P Selzam, Saskia Fürtjes, Anna E Gaspar, Héléna A Coleman, Jonathan R I Rimfeld, Kaili Breen, Gerome Plomin, Robert Folkersen, Lasse Lewis, Cathryn M |
description | The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value thresholds and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16-18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods. |
doi_str_mv | 10.1371/journal.pgen.1009021 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2541858057</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A663948816</galeid><doaj_id>oai_doaj_org_article_16d89b298f4a47c48c8b77552d412603</doaj_id><sourcerecordid>A663948816</sourcerecordid><originalsourceid>FETCH-LOGICAL-c792t-2d7b52b1f3a9fe4ea805174fd1343a7bf9449ddf9c0a91aaa97f2a476e6efc33</originalsourceid><addsrcrecordid>eNqVk99r1TAUx4sobk7_A9GCIPpwr02aNs2LMMbUC8OBDvEtnOZHb2baXJN28_rXm-5241b2oOQh4eRzvic5P5LkOcqWKKfo3aUbfAd2uWlUt0RZxjKMHiSHqCjyBSUZebh3PkiehHCZZXlRMfo4OchzRuIdPky-n16BHaA3rkudTjfObqOeEenGK2nEjb1V_dpJZ12zTa9NvzZdCqlXWnnVCbUIPXQSvDS_lUy1h1ZdO__jafJIgw3q2bQfJRcfTi9OPi3Ozj-uTo7PFoIy3C-wpHWBa6RzYFoRBVVWIEq0RDnJgdaaEcKk1ExkwBAAMKoxEFqqUmmR50fJy53sxrrAp5wEjguCqiJq0UisdoR0cMk33rTgt9yB4TcG5xsOvjfCKo5KWbEas0qTGEKQSlQ1pUWBJUG4zMZo76doQ90qKVTXe7Az0flNZ9a8cVe8QhjhqogCbyYB734OKvS8NUEoa6FTbhjfjXHOSsRG9NVf6P2_m6gG4gdMp12MK0ZRflyWscxVhcpILe-h4pKqNcJ1Sptonzm8nTlEple_-gaGEPjq65f_YD__O3v-bc6-3mPXCmy_Ds4OY0-GOUh2oPAuhNiYdwVBGR9n5TZzfJwVPs1KdHuxX8w7p9vhyP8ARMQO-g</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2541858057</pqid></control><display><type>article</type><title>Evaluation of polygenic prediction methodology within a reference-standardized framework</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Public Library of Science (PLoS)</source><creator>Pain, Oliver ; Glanville, Kylie P ; Hagenaars, Saskia P ; Selzam, Saskia ; Fürtjes, Anna E ; Gaspar, Héléna A ; Coleman, Jonathan R I ; Rimfeld, Kaili ; Breen, Gerome ; Plomin, Robert ; Folkersen, Lasse ; Lewis, Cathryn M</creator><creatorcontrib>Pain, Oliver ; Glanville, Kylie P ; Hagenaars, Saskia P ; Selzam, Saskia ; Fürtjes, Anna E ; Gaspar, Héléna A ; Coleman, Jonathan R I ; Rimfeld, Kaili ; Breen, Gerome ; Plomin, Robert ; Folkersen, Lasse ; Lewis, Cathryn M</creatorcontrib><description>The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value thresholds and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16-18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.</description><identifier>ISSN: 1553-7404</identifier><identifier>ISSN: 1553-7390</identifier><identifier>EISSN: 1553-7404</identifier><identifier>DOI: 10.1371/journal.pgen.1009021</identifier><identifier>PMID: 33945532</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Biobanks ; Biology and Life Sciences ; Breast cancer ; Cardiovascular disease ; Computer Simulation ; Consortia ; Coronary artery ; Datasets as Topic ; Diabetes mellitus ; Estimates ; Ethics ; Genetic diversity ; Genetic research ; Genetic screening ; Genetic variation ; Genome-Wide Association Study ; Genotype ; Health risk assessment ; Heart diseases ; Humans ; Inflammatory bowel diseases ; Linkage disequilibrium ; Medicine and Health Sciences ; Methods ; Models, Genetic ; Multifactorial Inheritance - genetics ; Multiple sclerosis ; Physical Sciences ; Polymorphism, Single Nucleotide - genetics ; Precision Medicine ; Prostate cancer ; Reproducibility of Results ; Research and Analysis Methods ; Rheumatoid arthritis ; Sample size ; Statistics ; Twin Studies as Topic ; Twins - genetics ; United Kingdom</subject><ispartof>PLoS genetics, 2021-05, Vol.17 (5), p.e1009021</ispartof><rights>COPYRIGHT 2021 Public Library of Science</rights><rights>2021 Pain et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Pain et al 2021 Pain et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c792t-2d7b52b1f3a9fe4ea805174fd1343a7bf9449ddf9c0a91aaa97f2a476e6efc33</citedby><cites>FETCH-LOGICAL-c792t-2d7b52b1f3a9fe4ea805174fd1343a7bf9449ddf9c0a91aaa97f2a476e6efc33</cites><orcidid>0000-0001-8321-9435 ; 0000-0002-8249-8476 ; 0000-0001-5680-3281 ; 0000-0003-0708-9530 ; 0000-0003-4985-8174 ; 0000-0002-0756-3629 ; 0000-0001-9697-8596 ; 0000-0001-6590-4957 ; 0000-0002-5540-2707 ; 0000-0002-6759-0944</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8121285/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8121285/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79343,79344</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33945532$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Pain, Oliver</creatorcontrib><creatorcontrib>Glanville, Kylie P</creatorcontrib><creatorcontrib>Hagenaars, Saskia P</creatorcontrib><creatorcontrib>Selzam, Saskia</creatorcontrib><creatorcontrib>Fürtjes, Anna E</creatorcontrib><creatorcontrib>Gaspar, Héléna A</creatorcontrib><creatorcontrib>Coleman, Jonathan R I</creatorcontrib><creatorcontrib>Rimfeld, Kaili</creatorcontrib><creatorcontrib>Breen, Gerome</creatorcontrib><creatorcontrib>Plomin, Robert</creatorcontrib><creatorcontrib>Folkersen, Lasse</creatorcontrib><creatorcontrib>Lewis, Cathryn M</creatorcontrib><title>Evaluation of polygenic prediction methodology within a reference-standardized framework</title><title>PLoS genetics</title><addtitle>PLoS Genet</addtitle><description>The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value thresholds and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16-18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.</description><subject>Biobanks</subject><subject>Biology and Life Sciences</subject><subject>Breast cancer</subject><subject>Cardiovascular disease</subject><subject>Computer Simulation</subject><subject>Consortia</subject><subject>Coronary artery</subject><subject>Datasets as Topic</subject><subject>Diabetes mellitus</subject><subject>Estimates</subject><subject>Ethics</subject><subject>Genetic diversity</subject><subject>Genetic research</subject><subject>Genetic screening</subject><subject>Genetic variation</subject><subject>Genome-Wide Association Study</subject><subject>Genotype</subject><subject>Health risk assessment</subject><subject>Heart diseases</subject><subject>Humans</subject><subject>Inflammatory bowel diseases</subject><subject>Linkage disequilibrium</subject><subject>Medicine and Health Sciences</subject><subject>Methods</subject><subject>Models, Genetic</subject><subject>Multifactorial Inheritance - genetics</subject><subject>Multiple sclerosis</subject><subject>Physical Sciences</subject><subject>Polymorphism, Single Nucleotide - genetics</subject><subject>Precision Medicine</subject><subject>Prostate cancer</subject><subject>Reproducibility of Results</subject><subject>Research and Analysis Methods</subject><subject>Rheumatoid arthritis</subject><subject>Sample size</subject><subject>Statistics</subject><subject>Twin Studies as Topic</subject><subject>Twins - genetics</subject><subject>United Kingdom</subject><issn>1553-7404</issn><issn>1553-7390</issn><issn>1553-7404</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><sourceid>DOA</sourceid><recordid>eNqVk99r1TAUx4sobk7_A9GCIPpwr02aNs2LMMbUC8OBDvEtnOZHb2baXJN28_rXm-5241b2oOQh4eRzvic5P5LkOcqWKKfo3aUbfAd2uWlUt0RZxjKMHiSHqCjyBSUZebh3PkiehHCZZXlRMfo4OchzRuIdPky-n16BHaA3rkudTjfObqOeEenGK2nEjb1V_dpJZ12zTa9NvzZdCqlXWnnVCbUIPXQSvDS_lUy1h1ZdO__jafJIgw3q2bQfJRcfTi9OPi3Ozj-uTo7PFoIy3C-wpHWBa6RzYFoRBVVWIEq0RDnJgdaaEcKk1ExkwBAAMKoxEFqqUmmR50fJy53sxrrAp5wEjguCqiJq0UisdoR0cMk33rTgt9yB4TcG5xsOvjfCKo5KWbEas0qTGEKQSlQ1pUWBJUG4zMZo76doQ90qKVTXe7Az0flNZ9a8cVe8QhjhqogCbyYB734OKvS8NUEoa6FTbhjfjXHOSsRG9NVf6P2_m6gG4gdMp12MK0ZRflyWscxVhcpILe-h4pKqNcJ1Sptonzm8nTlEple_-gaGEPjq65f_YD__O3v-bc6-3mPXCmy_Ds4OY0-GOUh2oPAuhNiYdwVBGR9n5TZzfJwVPs1KdHuxX8w7p9vhyP8ARMQO-g</recordid><startdate>20210504</startdate><enddate>20210504</enddate><creator>Pain, Oliver</creator><creator>Glanville, Kylie P</creator><creator>Hagenaars, Saskia P</creator><creator>Selzam, Saskia</creator><creator>Fürtjes, Anna E</creator><creator>Gaspar, Héléna A</creator><creator>Coleman, Jonathan R I</creator><creator>Rimfeld, Kaili</creator><creator>Breen, Gerome</creator><creator>Plomin, Robert</creator><creator>Folkersen, Lasse</creator><creator>Lewis, Cathryn M</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7TO</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FD</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-8321-9435</orcidid><orcidid>https://orcid.org/0000-0002-8249-8476</orcidid><orcidid>https://orcid.org/0000-0001-5680-3281</orcidid><orcidid>https://orcid.org/0000-0003-0708-9530</orcidid><orcidid>https://orcid.org/0000-0003-4985-8174</orcidid><orcidid>https://orcid.org/0000-0002-0756-3629</orcidid><orcidid>https://orcid.org/0000-0001-9697-8596</orcidid><orcidid>https://orcid.org/0000-0001-6590-4957</orcidid><orcidid>https://orcid.org/0000-0002-5540-2707</orcidid><orcidid>https://orcid.org/0000-0002-6759-0944</orcidid></search><sort><creationdate>20210504</creationdate><title>Evaluation of polygenic prediction methodology within a reference-standardized framework</title><author>Pain, Oliver ; Glanville, Kylie P ; Hagenaars, Saskia P ; Selzam, Saskia ; Fürtjes, Anna E ; Gaspar, Héléna A ; Coleman, Jonathan R I ; Rimfeld, Kaili ; Breen, Gerome ; Plomin, Robert ; Folkersen, Lasse ; Lewis, Cathryn M</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c792t-2d7b52b1f3a9fe4ea805174fd1343a7bf9449ddf9c0a91aaa97f2a476e6efc33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Biobanks</topic><topic>Biology and Life Sciences</topic><topic>Breast cancer</topic><topic>Cardiovascular disease</topic><topic>Computer Simulation</topic><topic>Consortia</topic><topic>Coronary artery</topic><topic>Datasets as Topic</topic><topic>Diabetes mellitus</topic><topic>Estimates</topic><topic>Ethics</topic><topic>Genetic diversity</topic><topic>Genetic research</topic><topic>Genetic screening</topic><topic>Genetic variation</topic><topic>Genome-Wide Association Study</topic><topic>Genotype</topic><topic>Health risk assessment</topic><topic>Heart diseases</topic><topic>Humans</topic><topic>Inflammatory bowel diseases</topic><topic>Linkage disequilibrium</topic><topic>Medicine and Health Sciences</topic><topic>Methods</topic><topic>Models, Genetic</topic><topic>Multifactorial Inheritance - genetics</topic><topic>Multiple sclerosis</topic><topic>Physical Sciences</topic><topic>Polymorphism, Single Nucleotide - genetics</topic><topic>Precision Medicine</topic><topic>Prostate cancer</topic><topic>Reproducibility of Results</topic><topic>Research and Analysis Methods</topic><topic>Rheumatoid arthritis</topic><topic>Sample size</topic><topic>Statistics</topic><topic>Twin Studies as Topic</topic><topic>Twins - genetics</topic><topic>United Kingdom</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pain, Oliver</creatorcontrib><creatorcontrib>Glanville, Kylie P</creatorcontrib><creatorcontrib>Hagenaars, Saskia P</creatorcontrib><creatorcontrib>Selzam, Saskia</creatorcontrib><creatorcontrib>Fürtjes, Anna E</creatorcontrib><creatorcontrib>Gaspar, Héléna A</creatorcontrib><creatorcontrib>Coleman, Jonathan R I</creatorcontrib><creatorcontrib>Rimfeld, Kaili</creatorcontrib><creatorcontrib>Breen, Gerome</creatorcontrib><creatorcontrib>Plomin, Robert</creatorcontrib><creatorcontrib>Folkersen, Lasse</creatorcontrib><creatorcontrib>Lewis, Cathryn M</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pain, Oliver</au><au>Glanville, Kylie P</au><au>Hagenaars, Saskia P</au><au>Selzam, Saskia</au><au>Fürtjes, Anna E</au><au>Gaspar, Héléna A</au><au>Coleman, Jonathan R I</au><au>Rimfeld, Kaili</au><au>Breen, Gerome</au><au>Plomin, Robert</au><au>Folkersen, Lasse</au><au>Lewis, Cathryn M</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluation of polygenic prediction methodology within a reference-standardized framework</atitle><jtitle>PLoS genetics</jtitle><addtitle>PLoS Genet</addtitle><date>2021-05-04</date><risdate>2021</risdate><volume>17</volume><issue>5</issue><spage>e1009021</spage><pages>e1009021-</pages><issn>1553-7404</issn><issn>1553-7390</issn><eissn>1553-7404</eissn><abstract>The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value thresholds and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16-18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>33945532</pmid><doi>10.1371/journal.pgen.1009021</doi><orcidid>https://orcid.org/0000-0001-8321-9435</orcidid><orcidid>https://orcid.org/0000-0002-8249-8476</orcidid><orcidid>https://orcid.org/0000-0001-5680-3281</orcidid><orcidid>https://orcid.org/0000-0003-0708-9530</orcidid><orcidid>https://orcid.org/0000-0003-4985-8174</orcidid><orcidid>https://orcid.org/0000-0002-0756-3629</orcidid><orcidid>https://orcid.org/0000-0001-9697-8596</orcidid><orcidid>https://orcid.org/0000-0001-6590-4957</orcidid><orcidid>https://orcid.org/0000-0002-5540-2707</orcidid><orcidid>https://orcid.org/0000-0002-6759-0944</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1553-7404 |
ispartof | PLoS genetics, 2021-05, Vol.17 (5), p.e1009021 |
issn | 1553-7404 1553-7390 1553-7404 |
language | eng |
recordid | cdi_plos_journals_2541858057 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Public Library of Science (PLoS) |
subjects | Biobanks Biology and Life Sciences Breast cancer Cardiovascular disease Computer Simulation Consortia Coronary artery Datasets as Topic Diabetes mellitus Estimates Ethics Genetic diversity Genetic research Genetic screening Genetic variation Genome-Wide Association Study Genotype Health risk assessment Heart diseases Humans Inflammatory bowel diseases Linkage disequilibrium Medicine and Health Sciences Methods Models, Genetic Multifactorial Inheritance - genetics Multiple sclerosis Physical Sciences Polymorphism, Single Nucleotide - genetics Precision Medicine Prostate cancer Reproducibility of Results Research and Analysis Methods Rheumatoid arthritis Sample size Statistics Twin Studies as Topic Twins - genetics United Kingdom |
title | Evaluation of polygenic prediction methodology within a reference-standardized framework |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T00%3A56%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluation%20of%20polygenic%20prediction%20methodology%20within%20a%20reference-standardized%20framework&rft.jtitle=PLoS%20genetics&rft.au=Pain,%20Oliver&rft.date=2021-05-04&rft.volume=17&rft.issue=5&rft.spage=e1009021&rft.pages=e1009021-&rft.issn=1553-7404&rft.eissn=1553-7404&rft_id=info:doi/10.1371/journal.pgen.1009021&rft_dat=%3Cgale_plos_%3EA663948816%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2541858057&rft_id=info:pmid/33945532&rft_galeid=A663948816&rft_doaj_id=oai_doaj_org_article_16d89b298f4a47c48c8b77552d412603&rfr_iscdi=true |