Summix: A method for detecting and adjusting for population structure in genetic summary data

Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:American journal of human genetics 2021-07, Vol.108 (7), p.1270-1282
Hauptverfasser: Arriaga-MacKenzie, Ian S., Matesi, Gregory, Chen, Samuel, Ronco, Alexandria, Marker, Katie M., Hall, Jordan R., Scherenberg, Ryan, Khajeh-Sharafabadi, Mobin, Wu, Yinfei, Gignoux, Christopher R., Null, Megan, Hendricks, Audrey E.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1282
container_issue 7
container_start_page 1270
container_title American journal of human genetics
container_volume 108
creator Arriaga-MacKenzie, Ian S.
Matesi, Gregory
Chen, Samuel
Ronco, Alexandria
Marker, Katie M.
Hall, Jordan R.
Scherenberg, Ryan
Khajeh-Sharafabadi, Mobin
Wu, Yinfei
Gignoux, Christopher R.
Null, Megan
Hendricks, Audrey E.
description Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix’s ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.
doi_str_mv 10.1016/j.ajhg.2021.05.016
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmed_primary_34157305</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0002929721002214</els_id><sourcerecordid>2544460068</sourcerecordid><originalsourceid>FETCH-LOGICAL-c455t-630efd739ece9664d96201b22cc289634fc0407eb717c663ef69803809efb1be3</originalsourceid><addsrcrecordid>eNqNkV-L1DAUxYso7rj6BXyQPArSepO0aSuysAz-gwUf1EcJaXI7k9I2Y5Ku-u1Nd8ZBX8SnJDfnnNybX5Y9pVBQoOLlUKhhvysYMFpAVaTSvWxDK17nQkB1P9sAAMtb1tYX2aMQBgBKG-APswte0qrmUG2yr5-WabI_XpFrMmHcO0N654nBiDraeUfUbIgywxLuTuvdwR2WUUXrZhKiX3RcPBI7kx3OGK0mIQUq_5MYFdXj7EGvxoBPTutl9uXtm8_b9_nNx3cfttc3uS6rKuaCA_am5i1qbIUoTSsY0I4xrVnTCl72GkqosatprYXg2Is2DdJAi31HO-SX2dUx97B0ExqNc_RqlAdv11akU1b-fTPbvdy5W9lwxlpep4DnpwDvvi0Yopxs0DiOaka3BMmqsiwFgGiSlB2l2rsQPPbnZyjIlYsc5MpFrlwkVDKVkunZnw2eLb9BJEFzFHzHzvVBW5w1nmUJpBDpJ8oq7aDZ2ngHYOuWOSbri_-3JvXroxoTj1uLXp4cxvrEXBpn_zXIL7LTwS4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2544460068</pqid></control><display><type>article</type><title>Summix: A method for detecting and adjusting for population structure in genetic summary data</title><source>MEDLINE</source><source>Cell Press Free Archives</source><source>Web of Science - Science Citation Index Expanded - 2021&lt;img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /&gt;</source><source>Access via ScienceDirect (Elsevier)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Arriaga-MacKenzie, Ian S. ; Matesi, Gregory ; Chen, Samuel ; Ronco, Alexandria ; Marker, Katie M. ; Hall, Jordan R. ; Scherenberg, Ryan ; Khajeh-Sharafabadi, Mobin ; Wu, Yinfei ; Gignoux, Christopher R. ; Null, Megan ; Hendricks, Audrey E.</creator><creatorcontrib>Arriaga-MacKenzie, Ian S. ; Matesi, Gregory ; Chen, Samuel ; Ronco, Alexandria ; Marker, Katie M. ; Hall, Jordan R. ; Scherenberg, Ryan ; Khajeh-Sharafabadi, Mobin ; Wu, Yinfei ; Gignoux, Christopher R. ; Null, Megan ; Hendricks, Audrey E.</creatorcontrib><description>Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix’s ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.</description><identifier>ISSN: 0002-9297</identifier><identifier>EISSN: 1537-6605</identifier><identifier>DOI: 10.1016/j.ajhg.2021.05.016</identifier><identifier>PMID: 34157305</identifier><language>eng</language><publisher>CAMBRIDGE: Elsevier Inc</publisher><subject>allele frequency ; Alleles ; ancestry ; common controls ; Computer Simulation ; Data Interpretation, Statistical ; external controls ; Gene Frequency ; Genetics &amp; Heredity ; gnomAD ; Humans ; Inheritance Patterns ; Life Sciences &amp; Biomedicine ; Metagenomics - methods ; Pedigree ; population stratification ; population structure ; Racial Groups - genetics ; Science &amp; Technology ; Software ; summary</subject><ispartof>American journal of human genetics, 2021-07, Vol.108 (7), p.1270-1282</ispartof><rights>2021 American Society of Human Genetics</rights><rights>Copyright © 2021 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.</rights><rights>2021 American Society of Human Genetics. 2021 American Society of Human Genetics</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>7</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000668964500008</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c455t-630efd739ece9664d96201b22cc289634fc0407eb717c663ef69803809efb1be3</citedby><cites>FETCH-LOGICAL-c455t-630efd739ece9664d96201b22cc289634fc0407eb717c663ef69803809efb1be3</cites><orcidid>0000-0003-1525-0574 ; 0000-0003-3441-401X ; 0000-0002-7152-0287</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8322937/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.ajhg.2021.05.016$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>230,315,728,781,785,886,3551,27929,27930,39263,46000,53796,53798</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34157305$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Arriaga-MacKenzie, Ian S.</creatorcontrib><creatorcontrib>Matesi, Gregory</creatorcontrib><creatorcontrib>Chen, Samuel</creatorcontrib><creatorcontrib>Ronco, Alexandria</creatorcontrib><creatorcontrib>Marker, Katie M.</creatorcontrib><creatorcontrib>Hall, Jordan R.</creatorcontrib><creatorcontrib>Scherenberg, Ryan</creatorcontrib><creatorcontrib>Khajeh-Sharafabadi, Mobin</creatorcontrib><creatorcontrib>Wu, Yinfei</creatorcontrib><creatorcontrib>Gignoux, Christopher R.</creatorcontrib><creatorcontrib>Null, Megan</creatorcontrib><creatorcontrib>Hendricks, Audrey E.</creatorcontrib><title>Summix: A method for detecting and adjusting for population structure in genetic summary data</title><title>American journal of human genetics</title><addtitle>AM J HUM GENET</addtitle><addtitle>Am J Hum Genet</addtitle><description>Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix’s ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.</description><subject>allele frequency</subject><subject>Alleles</subject><subject>ancestry</subject><subject>common controls</subject><subject>Computer Simulation</subject><subject>Data Interpretation, Statistical</subject><subject>external controls</subject><subject>Gene Frequency</subject><subject>Genetics &amp; Heredity</subject><subject>gnomAD</subject><subject>Humans</subject><subject>Inheritance Patterns</subject><subject>Life Sciences &amp; Biomedicine</subject><subject>Metagenomics - methods</subject><subject>Pedigree</subject><subject>population stratification</subject><subject>population structure</subject><subject>Racial Groups - genetics</subject><subject>Science &amp; Technology</subject><subject>Software</subject><subject>summary</subject><issn>0002-9297</issn><issn>1537-6605</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>HGBXW</sourceid><sourceid>EIF</sourceid><recordid>eNqNkV-L1DAUxYso7rj6BXyQPArSepO0aSuysAz-gwUf1EcJaXI7k9I2Y5Ku-u1Nd8ZBX8SnJDfnnNybX5Y9pVBQoOLlUKhhvysYMFpAVaTSvWxDK17nQkB1P9sAAMtb1tYX2aMQBgBKG-APswte0qrmUG2yr5-WabI_XpFrMmHcO0N654nBiDraeUfUbIgywxLuTuvdwR2WUUXrZhKiX3RcPBI7kx3OGK0mIQUq_5MYFdXj7EGvxoBPTutl9uXtm8_b9_nNx3cfttc3uS6rKuaCA_am5i1qbIUoTSsY0I4xrVnTCl72GkqosatprYXg2Is2DdJAi31HO-SX2dUx97B0ExqNc_RqlAdv11akU1b-fTPbvdy5W9lwxlpep4DnpwDvvi0Yopxs0DiOaka3BMmqsiwFgGiSlB2l2rsQPPbnZyjIlYsc5MpFrlwkVDKVkunZnw2eLb9BJEFzFHzHzvVBW5w1nmUJpBDpJ8oq7aDZ2ngHYOuWOSbri_-3JvXroxoTj1uLXp4cxvrEXBpn_zXIL7LTwS4</recordid><startdate>20210701</startdate><enddate>20210701</enddate><creator>Arriaga-MacKenzie, Ian S.</creator><creator>Matesi, Gregory</creator><creator>Chen, Samuel</creator><creator>Ronco, Alexandria</creator><creator>Marker, Katie M.</creator><creator>Hall, Jordan R.</creator><creator>Scherenberg, Ryan</creator><creator>Khajeh-Sharafabadi, Mobin</creator><creator>Wu, Yinfei</creator><creator>Gignoux, Christopher R.</creator><creator>Null, Megan</creator><creator>Hendricks, Audrey E.</creator><general>Elsevier Inc</general><general>Elsevier</general><scope>BLEPL</scope><scope>DTL</scope><scope>HGBXW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-1525-0574</orcidid><orcidid>https://orcid.org/0000-0003-3441-401X</orcidid><orcidid>https://orcid.org/0000-0002-7152-0287</orcidid></search><sort><creationdate>20210701</creationdate><title>Summix: A method for detecting and adjusting for population structure in genetic summary data</title><author>Arriaga-MacKenzie, Ian S. ; Matesi, Gregory ; Chen, Samuel ; Ronco, Alexandria ; Marker, Katie M. ; Hall, Jordan R. ; Scherenberg, Ryan ; Khajeh-Sharafabadi, Mobin ; Wu, Yinfei ; Gignoux, Christopher R. ; Null, Megan ; Hendricks, Audrey E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c455t-630efd739ece9664d96201b22cc289634fc0407eb717c663ef69803809efb1be3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>allele frequency</topic><topic>Alleles</topic><topic>ancestry</topic><topic>common controls</topic><topic>Computer Simulation</topic><topic>Data Interpretation, Statistical</topic><topic>external controls</topic><topic>Gene Frequency</topic><topic>Genetics &amp; Heredity</topic><topic>gnomAD</topic><topic>Humans</topic><topic>Inheritance Patterns</topic><topic>Life Sciences &amp; Biomedicine</topic><topic>Metagenomics - methods</topic><topic>Pedigree</topic><topic>population stratification</topic><topic>population structure</topic><topic>Racial Groups - genetics</topic><topic>Science &amp; Technology</topic><topic>Software</topic><topic>summary</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Arriaga-MacKenzie, Ian S.</creatorcontrib><creatorcontrib>Matesi, Gregory</creatorcontrib><creatorcontrib>Chen, Samuel</creatorcontrib><creatorcontrib>Ronco, Alexandria</creatorcontrib><creatorcontrib>Marker, Katie M.</creatorcontrib><creatorcontrib>Hall, Jordan R.</creatorcontrib><creatorcontrib>Scherenberg, Ryan</creatorcontrib><creatorcontrib>Khajeh-Sharafabadi, Mobin</creatorcontrib><creatorcontrib>Wu, Yinfei</creatorcontrib><creatorcontrib>Gignoux, Christopher R.</creatorcontrib><creatorcontrib>Null, Megan</creatorcontrib><creatorcontrib>Hendricks, Audrey E.</creatorcontrib><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>Web of Science - Science Citation Index Expanded - 2021</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>American journal of human genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Arriaga-MacKenzie, Ian S.</au><au>Matesi, Gregory</au><au>Chen, Samuel</au><au>Ronco, Alexandria</au><au>Marker, Katie M.</au><au>Hall, Jordan R.</au><au>Scherenberg, Ryan</au><au>Khajeh-Sharafabadi, Mobin</au><au>Wu, Yinfei</au><au>Gignoux, Christopher R.</au><au>Null, Megan</au><au>Hendricks, Audrey E.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Summix: A method for detecting and adjusting for population structure in genetic summary data</atitle><jtitle>American journal of human genetics</jtitle><stitle>AM J HUM GENET</stitle><addtitle>Am J Hum Genet</addtitle><date>2021-07-01</date><risdate>2021</risdate><volume>108</volume><issue>7</issue><spage>1270</spage><epage>1282</epage><pages>1270-1282</pages><issn>0002-9297</issn><eissn>1537-6605</eissn><abstract>Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix’s ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.</abstract><cop>CAMBRIDGE</cop><pub>Elsevier Inc</pub><pmid>34157305</pmid><doi>10.1016/j.ajhg.2021.05.016</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-1525-0574</orcidid><orcidid>https://orcid.org/0000-0003-3441-401X</orcidid><orcidid>https://orcid.org/0000-0002-7152-0287</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0002-9297
ispartof American journal of human genetics, 2021-07, Vol.108 (7), p.1270-1282
issn 0002-9297
1537-6605
language eng
recordid cdi_pubmed_primary_34157305
source MEDLINE; Cell Press Free Archives; Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />; Access via ScienceDirect (Elsevier); EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects allele frequency
Alleles
ancestry
common controls
Computer Simulation
Data Interpretation, Statistical
external controls
Gene Frequency
Genetics & Heredity
gnomAD
Humans
Inheritance Patterns
Life Sciences & Biomedicine
Metagenomics - methods
Pedigree
population stratification
population structure
Racial Groups - genetics
Science & Technology
Software
summary
title Summix: A method for detecting and adjusting for population structure in genetic summary data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-12T05%3A56%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Summix:%20A%20method%20for%20detecting%20and%20adjusting%20for%20population%20structure%20in%20genetic%20summary%20data&rft.jtitle=American%20journal%20of%20human%20genetics&rft.au=Arriaga-MacKenzie,%20Ian%20S.&rft.date=2021-07-01&rft.volume=108&rft.issue=7&rft.spage=1270&rft.epage=1282&rft.pages=1270-1282&rft.issn=0002-9297&rft.eissn=1537-6605&rft_id=info:doi/10.1016/j.ajhg.2021.05.016&rft_dat=%3Cproquest_pubme%3E2544460068%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2544460068&rft_id=info:pmid/34157305&rft_els_id=S0002929721002214&rfr_iscdi=true