MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric

Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	American journal of human genetics 2024-05, Vol.111 (5), p.990-995
Hauptverfasser:	Sun, Quan, Yang, Yingxi, Rosen, Jonathan D., Chen, Jiawen, Li, Xihao, Guan, Wyliena, Jiang, Min-Zhi, Wen, Jia, Pace, Rhonda G., Blackman, Scott M., Bamshad, Michael J., Gibson, Ronald L., Cutting, Garry R., O’Neal, Wanda K., Knowles, Michael R., Kooperberg, Charles, Reiner, Alexander P., Raffield, Laura M., Carson, April P., Rich, Stephen S., Rotter, Jerome I., Loos, Ruth J.F., Kenny, Eimear, Jaeger, Byron C., Min, Yuan-I, Fuchsberger, Christian, Li, Yun
Format:	Artikel
Sprache:	eng
Schlagworte:	Cohort Studies cross-cohort Gene Frequency Genome, Human genome-wide association studies Genome-Wide Association Study - methods Genotype genotype imputation Humans imputation quality Linkage Disequilibrium Machine Learning Polymorphism, Single Nucleotide Quality Control rare variants Software variant filtering Whole Genome Sequencing - methods Whole Genome Sequencing - standards whole-genome sequencing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	995
container_issue	5
container_start_page	990
container_title	American journal of human genetics
container_volume	111
creator	Sun, Quan Yang, Yingxi Rosen, Jonathan D. Chen, Jiawen Li, Xihao Guan, Wyliena Jiang, Min-Zhi Wen, Jia Pace, Rhonda G. Blackman, Scott M. Bamshad, Michael J. Gibson, Ronald L. Cutting, Garry R. O’Neal, Wanda K. Knowles, Michael R. Kooperberg, Charles Reiner, Alexander P. Raffield, Laura M. Carson, April P. Rich, Stephen S. Rotter, Jerome I. Loos, Ruth J.F. Kenny, Eimear Jaeger, Byron C. Min, Yuan-I Fuchsberger, Christian Li, Yun
description	Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%–14.4% improvement in squared Pearson correlation with true R2, corresponding to 85–218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants. Ever-growing reference panels allow imputation of a huge number (∼108) of lower-frequency variants. However, the standard imputation quality metric poorly reflects the true quality of uncommon variants. We introduce MagicalRsq-X, an extension of MagicalRsq that allows model training across cohorts for which only the genotypes used for imputation are available.
doi_str_mv	10.1016/j.ajhg.2024.04.001
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11080605</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0002929724001162</els_id><sourcerecordid>3043075214</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-73dde39a55540d73acfbdfe293b4093943fd14eff726ed6952b4d2832979538f3</originalsourceid><addsrcrecordid>eNp9kU9rGzEQxUVpady0X6CHssde1h1Jq_1TCiWENCk4BEoLvQmtNLJldle2pA3421eu09BcAgM6zG_ejN4j5D2FJQVaf9ou1XazXjJg1RJyAX1BFlTwpqxrEC_JAgBY2bGuOSNvYtxmgLbAX5Mz3ta8FhQWZHWr1k6r4Ufcl78_FxeFDj7GUvuND6lIQU3RYlD9gMUaJ58OOyzcuJuTSs5PxX5Wg0uHYsQUnH5LXlk1RHz38J6TX9-ufl7elKu76--XF6tS85qnsuHGIO-UEKIC03ClbW8sso73FXS8q7g1tEJrG1ajqTvB-sqwluePdIK3lp-Tryfd3dyPaDRO-dBB7oIbVThIr5x82pncRq79vaQUWsjeZIWPDwrB72eMSY4uahwGNaGfo-RQcWgEo1VG2Qn960xA-7iHgjzmILfymIM85iAhF9A89OH_Cx9H_hmfgS8nALNP9w6DjNrhpNG4gDpJ491z-n8AQkKaoQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3043075214</pqid></control><display><type>article</type><title>MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric</title><source>MEDLINE</source><source>Cell Press Free Archives</source><source>Elsevier ScienceDirect Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Sun, Quan ; Yang, Yingxi ; Rosen, Jonathan D. ; Chen, Jiawen ; Li, Xihao ; Guan, Wyliena ; Jiang, Min-Zhi ; Wen, Jia ; Pace, Rhonda G. ; Blackman, Scott M. ; Bamshad, Michael J. ; Gibson, Ronald L. ; Cutting, Garry R. ; O’Neal, Wanda K. ; Knowles, Michael R. ; Kooperberg, Charles ; Reiner, Alexander P. ; Raffield, Laura M. ; Carson, April P. ; Rich, Stephen S. ; Rotter, Jerome I. ; Loos, Ruth J.F. ; Kenny, Eimear ; Jaeger, Byron C. ; Min, Yuan-I ; Fuchsberger, Christian ; Li, Yun</creator><creatorcontrib>Sun, Quan ; Yang, Yingxi ; Rosen, Jonathan D. ; Chen, Jiawen ; Li, Xihao ; Guan, Wyliena ; Jiang, Min-Zhi ; Wen, Jia ; Pace, Rhonda G. ; Blackman, Scott M. ; Bamshad, Michael J. ; Gibson, Ronald L. ; Cutting, Garry R. ; O’Neal, Wanda K. ; Knowles, Michael R. ; Kooperberg, Charles ; Reiner, Alexander P. ; Raffield, Laura M. ; Carson, April P. ; Rich, Stephen S. ; Rotter, Jerome I. ; Loos, Ruth J.F. ; Kenny, Eimear ; Jaeger, Byron C. ; Min, Yuan-I ; Fuchsberger, Christian ; Li, Yun</creatorcontrib><description>Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%–14.4% improvement in squared Pearson correlation with true R2, corresponding to 85–218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants. Ever-growing reference panels allow imputation of a huge number (∼108) of lower-frequency variants. However, the standard imputation quality metric poorly reflects the true quality of uncommon variants. We introduce MagicalRsq-X, an extension of MagicalRsq that allows model training across cohorts for which only the genotypes used for imputation are available.</description><identifier>ISSN: 0002-9297</identifier><identifier>ISSN: 1537-6605</identifier><identifier>EISSN: 1537-6605</identifier><identifier>DOI: 10.1016/j.ajhg.2024.04.001</identifier><identifier>PMID: 38636510</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Cohort Studies ; cross-cohort ; Gene Frequency ; Genome, Human ; genome-wide association studies ; Genome-Wide Association Study - methods ; Genotype ; genotype imputation ; Humans ; imputation quality ; Linkage Disequilibrium ; Machine Learning ; Polymorphism, Single Nucleotide ; Quality Control ; rare variants ; Software ; variant filtering ; Whole Genome Sequencing - methods ; Whole Genome Sequencing - standards ; whole-genome sequencing</subject><ispartof>American journal of human genetics, 2024-05, Vol.111 (5), p.990-995</ispartof><rights>2024 American Society of Human Genetics</rights><rights>Copyright © 2024 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.</rights><rights>2024 American Society of Human Genetics. 2024 American Society of Human Genetics</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c363t-73dde39a55540d73acfbdfe293b4093943fd14eff726ed6952b4d2832979538f3</cites><orcidid>0000-0002-9275-4189</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11080605/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0002929724001162$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>230,314,723,776,780,881,3537,27903,27904,53769,53771,65309</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38636510$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Sun, Quan</creatorcontrib><creatorcontrib>Yang, Yingxi</creatorcontrib><creatorcontrib>Rosen, Jonathan D.</creatorcontrib><creatorcontrib>Chen, Jiawen</creatorcontrib><creatorcontrib>Li, Xihao</creatorcontrib><creatorcontrib>Guan, Wyliena</creatorcontrib><creatorcontrib>Jiang, Min-Zhi</creatorcontrib><creatorcontrib>Wen, Jia</creatorcontrib><creatorcontrib>Pace, Rhonda G.</creatorcontrib><creatorcontrib>Blackman, Scott M.</creatorcontrib><creatorcontrib>Bamshad, Michael J.</creatorcontrib><creatorcontrib>Gibson, Ronald L.</creatorcontrib><creatorcontrib>Cutting, Garry R.</creatorcontrib><creatorcontrib>O’Neal, Wanda K.</creatorcontrib><creatorcontrib>Knowles, Michael R.</creatorcontrib><creatorcontrib>Kooperberg, Charles</creatorcontrib><creatorcontrib>Reiner, Alexander P.</creatorcontrib><creatorcontrib>Raffield, Laura M.</creatorcontrib><creatorcontrib>Carson, April P.</creatorcontrib><creatorcontrib>Rich, Stephen S.</creatorcontrib><creatorcontrib>Rotter, Jerome I.</creatorcontrib><creatorcontrib>Loos, Ruth J.F.</creatorcontrib><creatorcontrib>Kenny, Eimear</creatorcontrib><creatorcontrib>Jaeger, Byron C.</creatorcontrib><creatorcontrib>Min, Yuan-I</creatorcontrib><creatorcontrib>Fuchsberger, Christian</creatorcontrib><creatorcontrib>Li, Yun</creatorcontrib><title>MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric</title><title>American journal of human genetics</title><addtitle>Am J Hum Genet</addtitle><description>Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%–14.4% improvement in squared Pearson correlation with true R2, corresponding to 85–218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants. Ever-growing reference panels allow imputation of a huge number (∼108) of lower-frequency variants. However, the standard imputation quality metric poorly reflects the true quality of uncommon variants. We introduce MagicalRsq-X, an extension of MagicalRsq that allows model training across cohorts for which only the genotypes used for imputation are available.</description><subject>Cohort Studies</subject><subject>cross-cohort</subject><subject>Gene Frequency</subject><subject>Genome, Human</subject><subject>genome-wide association studies</subject><subject>Genome-Wide Association Study - methods</subject><subject>Genotype</subject><subject>genotype imputation</subject><subject>Humans</subject><subject>imputation quality</subject><subject>Linkage Disequilibrium</subject><subject>Machine Learning</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Quality Control</subject><subject>rare variants</subject><subject>Software</subject><subject>variant filtering</subject><subject>Whole Genome Sequencing - methods</subject><subject>Whole Genome Sequencing - standards</subject><subject>whole-genome sequencing</subject><issn>0002-9297</issn><issn>1537-6605</issn><issn>1537-6605</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kU9rGzEQxUVpady0X6CHssde1h1Jq_1TCiWENCk4BEoLvQmtNLJldle2pA3421eu09BcAgM6zG_ejN4j5D2FJQVaf9ou1XazXjJg1RJyAX1BFlTwpqxrEC_JAgBY2bGuOSNvYtxmgLbAX5Mz3ta8FhQWZHWr1k6r4Ufcl78_FxeFDj7GUvuND6lIQU3RYlD9gMUaJ58OOyzcuJuTSs5PxX5Wg0uHYsQUnH5LXlk1RHz38J6TX9-ufl7elKu76--XF6tS85qnsuHGIO-UEKIC03ClbW8sso73FXS8q7g1tEJrG1ajqTvB-sqwluePdIK3lp-Tryfd3dyPaDRO-dBB7oIbVThIr5x82pncRq79vaQUWsjeZIWPDwrB72eMSY4uahwGNaGfo-RQcWgEo1VG2Qn960xA-7iHgjzmILfymIM85iAhF9A89OH_Cx9H_hmfgS8nALNP9w6DjNrhpNG4gDpJ491z-n8AQkKaoQ</recordid><startdate>20240502</startdate><enddate>20240502</enddate><creator>Sun, Quan</creator><creator>Yang, Yingxi</creator><creator>Rosen, Jonathan D.</creator><creator>Chen, Jiawen</creator><creator>Li, Xihao</creator><creator>Guan, Wyliena</creator><creator>Jiang, Min-Zhi</creator><creator>Wen, Jia</creator><creator>Pace, Rhonda G.</creator><creator>Blackman, Scott M.</creator><creator>Bamshad, Michael J.</creator><creator>Gibson, Ronald L.</creator><creator>Cutting, Garry R.</creator><creator>O’Neal, Wanda K.</creator><creator>Knowles, Michael R.</creator><creator>Kooperberg, Charles</creator><creator>Reiner, Alexander P.</creator><creator>Raffield, Laura M.</creator><creator>Carson, April P.</creator><creator>Rich, Stephen S.</creator><creator>Rotter, Jerome I.</creator><creator>Loos, Ruth J.F.</creator><creator>Kenny, Eimear</creator><creator>Jaeger, Byron C.</creator><creator>Min, Yuan-I</creator><creator>Fuchsberger, Christian</creator><creator>Li, Yun</creator><general>Elsevier Inc</general><general>Elsevier</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-9275-4189</orcidid></search><sort><creationdate>20240502</creationdate><title>MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric</title><author>Sun, Quan ; Yang, Yingxi ; Rosen, Jonathan D. ; Chen, Jiawen ; Li, Xihao ; Guan, Wyliena ; Jiang, Min-Zhi ; Wen, Jia ; Pace, Rhonda G. ; Blackman, Scott M. ; Bamshad, Michael J. ; Gibson, Ronald L. ; Cutting, Garry R. ; O’Neal, Wanda K. ; Knowles, Michael R. ; Kooperberg, Charles ; Reiner, Alexander P. ; Raffield, Laura M. ; Carson, April P. ; Rich, Stephen S. ; Rotter, Jerome I. ; Loos, Ruth J.F. ; Kenny, Eimear ; Jaeger, Byron C. ; Min, Yuan-I ; Fuchsberger, Christian ; Li, Yun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-73dde39a55540d73acfbdfe293b4093943fd14eff726ed6952b4d2832979538f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Cohort Studies</topic><topic>cross-cohort</topic><topic>Gene Frequency</topic><topic>Genome, Human</topic><topic>genome-wide association studies</topic><topic>Genome-Wide Association Study - methods</topic><topic>Genotype</topic><topic>genotype imputation</topic><topic>Humans</topic><topic>imputation quality</topic><topic>Linkage Disequilibrium</topic><topic>Machine Learning</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Quality Control</topic><topic>rare variants</topic><topic>Software</topic><topic>variant filtering</topic><topic>Whole Genome Sequencing - methods</topic><topic>Whole Genome Sequencing - standards</topic><topic>whole-genome sequencing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Quan</creatorcontrib><creatorcontrib>Yang, Yingxi</creatorcontrib><creatorcontrib>Rosen, Jonathan D.</creatorcontrib><creatorcontrib>Chen, Jiawen</creatorcontrib><creatorcontrib>Li, Xihao</creatorcontrib><creatorcontrib>Guan, Wyliena</creatorcontrib><creatorcontrib>Jiang, Min-Zhi</creatorcontrib><creatorcontrib>Wen, Jia</creatorcontrib><creatorcontrib>Pace, Rhonda G.</creatorcontrib><creatorcontrib>Blackman, Scott M.</creatorcontrib><creatorcontrib>Bamshad, Michael J.</creatorcontrib><creatorcontrib>Gibson, Ronald L.</creatorcontrib><creatorcontrib>Cutting, Garry R.</creatorcontrib><creatorcontrib>O’Neal, Wanda K.</creatorcontrib><creatorcontrib>Knowles, Michael R.</creatorcontrib><creatorcontrib>Kooperberg, Charles</creatorcontrib><creatorcontrib>Reiner, Alexander P.</creatorcontrib><creatorcontrib>Raffield, Laura M.</creatorcontrib><creatorcontrib>Carson, April P.</creatorcontrib><creatorcontrib>Rich, Stephen S.</creatorcontrib><creatorcontrib>Rotter, Jerome I.</creatorcontrib><creatorcontrib>Loos, Ruth J.F.</creatorcontrib><creatorcontrib>Kenny, Eimear</creatorcontrib><creatorcontrib>Jaeger, Byron C.</creatorcontrib><creatorcontrib>Min, Yuan-I</creatorcontrib><creatorcontrib>Fuchsberger, Christian</creatorcontrib><creatorcontrib>Li, Yun</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>American journal of human genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Quan</au><au>Yang, Yingxi</au><au>Rosen, Jonathan D.</au><au>Chen, Jiawen</au><au>Li, Xihao</au><au>Guan, Wyliena</au><au>Jiang, Min-Zhi</au><au>Wen, Jia</au><au>Pace, Rhonda G.</au><au>Blackman, Scott M.</au><au>Bamshad, Michael J.</au><au>Gibson, Ronald L.</au><au>Cutting, Garry R.</au><au>O’Neal, Wanda K.</au><au>Knowles, Michael R.</au><au>Kooperberg, Charles</au><au>Reiner, Alexander P.</au><au>Raffield, Laura M.</au><au>Carson, April P.</au><au>Rich, Stephen S.</au><au>Rotter, Jerome I.</au><au>Loos, Ruth J.F.</au><au>Kenny, Eimear</au><au>Jaeger, Byron C.</au><au>Min, Yuan-I</au><au>Fuchsberger, Christian</au><au>Li, Yun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric</atitle><jtitle>American journal of human genetics</jtitle><addtitle>Am J Hum Genet</addtitle><date>2024-05-02</date><risdate>2024</risdate><volume>111</volume><issue>5</issue><spage>990</spage><epage>995</epage><pages>990-995</pages><issn>0002-9297</issn><issn>1537-6605</issn><eissn>1537-6605</eissn><abstract>Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%–14.4% improvement in squared Pearson correlation with true R2, corresponding to 85–218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants. Ever-growing reference panels allow imputation of a huge number (∼108) of lower-frequency variants. However, the standard imputation quality metric poorly reflects the true quality of uncommon variants. We introduce MagicalRsq-X, an extension of MagicalRsq that allows model training across cohorts for which only the genotypes used for imputation are available.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>38636510</pmid><doi>10.1016/j.ajhg.2024.04.001</doi><tpages>6</tpages><orcidid>https://orcid.org/0000-0002-9275-4189</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0002-9297
ispartof	American journal of human genetics, 2024-05, Vol.111 (5), p.990-995
issn	0002-9297 1537-6605 1537-6605
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11080605
source	MEDLINE; Cell Press Free Archives; Elsevier ScienceDirect Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects	Cohort Studies cross-cohort Gene Frequency Genome, Human genome-wide association studies Genome-Wide Association Study - methods Genotype genotype imputation Humans imputation quality Linkage Disequilibrium Machine Learning Polymorphism, Single Nucleotide Quality Control rare variants Software variant filtering Whole Genome Sequencing - methods Whole Genome Sequencing - standards whole-genome sequencing
title	MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T15%3A52%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MagicalRsq-X:%20A%20cross-cohort%20transferable%20genotype%20imputation%20quality%20metric&rft.jtitle=American%20journal%20of%20human%20genetics&rft.au=Sun,%20Quan&rft.date=2024-05-02&rft.volume=111&rft.issue=5&rft.spage=990&rft.epage=995&rft.pages=990-995&rft.issn=0002-9297&rft.eissn=1537-6605&rft_id=info:doi/10.1016/j.ajhg.2024.04.001&rft_dat=%3Cproquest_pubme%3E3043075214%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3043075214&rft_id=info:pmid/38636510&rft_els_id=S0002929724001162&rfr_iscdi=true