A harmonized public resource of deeply sequenced diverse human genomes
Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because...
Gespeichert in:
Veröffentlicht in: | Genome research 2024-05, Vol.34 (5), p.796-809 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 809 |
---|---|
container_issue | 5 |
container_start_page | 796 |
container_title | Genome research |
container_volume | 34 |
creator | Koenig, Zan Yohannes, Mary T Nkambule, Lethukuthula L Zhao, Xuefang Goodrich, Julia K Kim, Heesu Ally Wilson, Michael W Tiao, Grace Hao, Stephanie P Sahakian, Nareh Chao, Katherine R Walker, Mark A Lyu, Yunfei Rehm, Heidi L Neale, Benjamin M Talkowski, Michael E Daly, Mark J Brand, Harrison Karczewski, Konrad J Atkinson, Elizabeth G Martin, Alicia R |
description | Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations. |
doi_str_mv | 10.1101/gr.278378.123 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11216312</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3076294468</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2222-a7779b6f7ecc78fe5cd3b446e6b04ca8f4f0a9305c0d9222f7eed8a3b43679d93</originalsourceid><addsrcrecordid>eNpdkc1LAzEQxYMotlaPXmXBi5etyWY3HyeRYlUoeNFzSLOz7ZbdTU26Qv3rndIqai4JzC_z3swj5JLRMWOU3S7COJOKSzVmGT8iQ1bkOi1yoY_xTZVKNS3YgJzFuKKU8lypUzLgSuZaFGJIpvfJ0obWd_UnlMm6nze1SwJE3wcHia-SEmDdbJMI7z10Dpmy_oAQIVn2re2SBXS-hXhOTirbRLg43CPyNn14nTyls5fH58n9LHUZntRKKfVcVBKck6qCwpV8nucCxJzmzqoqr6jVnBaOlho_IAilsshwIXWp-Yjc7fui0xZKB90m2MasQ93asDXe1uZvpauXZuE_DGMZExxXNCI3hw7B40hxY9o6Omga24Hvo0HxQmmui53Y9T90hWvpcD6kpMg0OldIpXvKBR9jgOrHDaNmF5FZBLOPyKA-8le_R_ihvzPhX4G-jSc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3076294468</pqid></control><display><type>article</type><title>A harmonized public resource of deeply sequenced diverse human genomes</title><source>MEDLINE</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Koenig, Zan ; Yohannes, Mary T ; Nkambule, Lethukuthula L ; Zhao, Xuefang ; Goodrich, Julia K ; Kim, Heesu Ally ; Wilson, Michael W ; Tiao, Grace ; Hao, Stephanie P ; Sahakian, Nareh ; Chao, Katherine R ; Walker, Mark A ; Lyu, Yunfei ; Rehm, Heidi L ; Neale, Benjamin M ; Talkowski, Michael E ; Daly, Mark J ; Brand, Harrison ; Karczewski, Konrad J ; Atkinson, Elizabeth G ; Martin, Alicia R</creator><creatorcontrib>Koenig, Zan ; Yohannes, Mary T ; Nkambule, Lethukuthula L ; Zhao, Xuefang ; Goodrich, Julia K ; Kim, Heesu Ally ; Wilson, Michael W ; Tiao, Grace ; Hao, Stephanie P ; Sahakian, Nareh ; Chao, Katherine R ; Walker, Mark A ; Lyu, Yunfei ; Rehm, Heidi L ; Neale, Benjamin M ; Talkowski, Michael E ; Daly, Mark J ; Brand, Harrison ; Karczewski, Konrad J ; Atkinson, Elizabeth G ; Martin, Alicia R ; gnomAD Project Consortium</creatorcontrib><description>Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.</description><identifier>ISSN: 1088-9051</identifier><identifier>ISSN: 1549-5469</identifier><identifier>EISSN: 1549-5469</identifier><identifier>DOI: 10.1101/gr.278378.123</identifier><identifier>PMID: 38749656</identifier><language>eng</language><publisher>United States: Cold Spring Harbor Laboratory Press</publisher><subject>Databases, Genetic ; Genetic diversity ; Genetic Variation ; Genome, Human ; Genomes ; Genomics ; Genomics - methods ; Haplotypes ; High-Throughput Nucleotide Sequencing - methods ; Human Genome Project ; Humans ; Population structure ; Population studies ; Resource</subject><ispartof>Genome research, 2024-05, Vol.34 (5), p.796-809</ispartof><rights>2024 Koenig et al.; Published by Cold Spring Harbor Laboratory Press.</rights><rights>Copyright Cold Spring Harbor Laboratory Press May 2024</rights><rights>2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2222-a7779b6f7ecc78fe5cd3b446e6b04ca8f4f0a9305c0d9222f7eed8a3b43679d93</citedby><cites>FETCH-LOGICAL-c2222-a7779b6f7ecc78fe5cd3b446e6b04ca8f4f0a9305c0d9222f7eed8a3b43679d93</cites><orcidid>0000-0002-0949-8752 ; 0000-0002-6025-0015 ; 0000-0003-1513-6077 ; 0000-0003-0241-3522 ; 0000-0003-2897-2410 ; 0000-0002-6308-776X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11216312/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11216312/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38749656$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Koenig, Zan</creatorcontrib><creatorcontrib>Yohannes, Mary T</creatorcontrib><creatorcontrib>Nkambule, Lethukuthula L</creatorcontrib><creatorcontrib>Zhao, Xuefang</creatorcontrib><creatorcontrib>Goodrich, Julia K</creatorcontrib><creatorcontrib>Kim, Heesu Ally</creatorcontrib><creatorcontrib>Wilson, Michael W</creatorcontrib><creatorcontrib>Tiao, Grace</creatorcontrib><creatorcontrib>Hao, Stephanie P</creatorcontrib><creatorcontrib>Sahakian, Nareh</creatorcontrib><creatorcontrib>Chao, Katherine R</creatorcontrib><creatorcontrib>Walker, Mark A</creatorcontrib><creatorcontrib>Lyu, Yunfei</creatorcontrib><creatorcontrib>Rehm, Heidi L</creatorcontrib><creatorcontrib>Neale, Benjamin M</creatorcontrib><creatorcontrib>Talkowski, Michael E</creatorcontrib><creatorcontrib>Daly, Mark J</creatorcontrib><creatorcontrib>Brand, Harrison</creatorcontrib><creatorcontrib>Karczewski, Konrad J</creatorcontrib><creatorcontrib>Atkinson, Elizabeth G</creatorcontrib><creatorcontrib>Martin, Alicia R</creatorcontrib><creatorcontrib>gnomAD Project Consortium</creatorcontrib><title>A harmonized public resource of deeply sequenced diverse human genomes</title><title>Genome research</title><addtitle>Genome Res</addtitle><description>Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.</description><subject>Databases, Genetic</subject><subject>Genetic diversity</subject><subject>Genetic Variation</subject><subject>Genome, Human</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Genomics - methods</subject><subject>Haplotypes</subject><subject>High-Throughput Nucleotide Sequencing - methods</subject><subject>Human Genome Project</subject><subject>Humans</subject><subject>Population structure</subject><subject>Population studies</subject><subject>Resource</subject><issn>1088-9051</issn><issn>1549-5469</issn><issn>1549-5469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkc1LAzEQxYMotlaPXmXBi5etyWY3HyeRYlUoeNFzSLOz7ZbdTU26Qv3rndIqai4JzC_z3swj5JLRMWOU3S7COJOKSzVmGT8iQ1bkOi1yoY_xTZVKNS3YgJzFuKKU8lypUzLgSuZaFGJIpvfJ0obWd_UnlMm6nze1SwJE3wcHia-SEmDdbJMI7z10Dpmy_oAQIVn2re2SBXS-hXhOTirbRLg43CPyNn14nTyls5fH58n9LHUZntRKKfVcVBKck6qCwpV8nucCxJzmzqoqr6jVnBaOlho_IAilsshwIXWp-Yjc7fui0xZKB90m2MasQ93asDXe1uZvpauXZuE_DGMZExxXNCI3hw7B40hxY9o6Omga24Hvo0HxQmmui53Y9T90hWvpcD6kpMg0OldIpXvKBR9jgOrHDaNmF5FZBLOPyKA-8le_R_ihvzPhX4G-jSc</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Koenig, Zan</creator><creator>Yohannes, Mary T</creator><creator>Nkambule, Lethukuthula L</creator><creator>Zhao, Xuefang</creator><creator>Goodrich, Julia K</creator><creator>Kim, Heesu Ally</creator><creator>Wilson, Michael W</creator><creator>Tiao, Grace</creator><creator>Hao, Stephanie P</creator><creator>Sahakian, Nareh</creator><creator>Chao, Katherine R</creator><creator>Walker, Mark A</creator><creator>Lyu, Yunfei</creator><creator>Rehm, Heidi L</creator><creator>Neale, Benjamin M</creator><creator>Talkowski, Michael E</creator><creator>Daly, Mark J</creator><creator>Brand, Harrison</creator><creator>Karczewski, Konrad J</creator><creator>Atkinson, Elizabeth G</creator><creator>Martin, Alicia R</creator><general>Cold Spring Harbor Laboratory Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-0949-8752</orcidid><orcidid>https://orcid.org/0000-0002-6025-0015</orcidid><orcidid>https://orcid.org/0000-0003-1513-6077</orcidid><orcidid>https://orcid.org/0000-0003-0241-3522</orcidid><orcidid>https://orcid.org/0000-0003-2897-2410</orcidid><orcidid>https://orcid.org/0000-0002-6308-776X</orcidid></search><sort><creationdate>20240501</creationdate><title>A harmonized public resource of deeply sequenced diverse human genomes</title><author>Koenig, Zan ; Yohannes, Mary T ; Nkambule, Lethukuthula L ; Zhao, Xuefang ; Goodrich, Julia K ; Kim, Heesu Ally ; Wilson, Michael W ; Tiao, Grace ; Hao, Stephanie P ; Sahakian, Nareh ; Chao, Katherine R ; Walker, Mark A ; Lyu, Yunfei ; Rehm, Heidi L ; Neale, Benjamin M ; Talkowski, Michael E ; Daly, Mark J ; Brand, Harrison ; Karczewski, Konrad J ; Atkinson, Elizabeth G ; Martin, Alicia R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2222-a7779b6f7ecc78fe5cd3b446e6b04ca8f4f0a9305c0d9222f7eed8a3b43679d93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Databases, Genetic</topic><topic>Genetic diversity</topic><topic>Genetic Variation</topic><topic>Genome, Human</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Genomics - methods</topic><topic>Haplotypes</topic><topic>High-Throughput Nucleotide Sequencing - methods</topic><topic>Human Genome Project</topic><topic>Humans</topic><topic>Population structure</topic><topic>Population studies</topic><topic>Resource</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Koenig, Zan</creatorcontrib><creatorcontrib>Yohannes, Mary T</creatorcontrib><creatorcontrib>Nkambule, Lethukuthula L</creatorcontrib><creatorcontrib>Zhao, Xuefang</creatorcontrib><creatorcontrib>Goodrich, Julia K</creatorcontrib><creatorcontrib>Kim, Heesu Ally</creatorcontrib><creatorcontrib>Wilson, Michael W</creatorcontrib><creatorcontrib>Tiao, Grace</creatorcontrib><creatorcontrib>Hao, Stephanie P</creatorcontrib><creatorcontrib>Sahakian, Nareh</creatorcontrib><creatorcontrib>Chao, Katherine R</creatorcontrib><creatorcontrib>Walker, Mark A</creatorcontrib><creatorcontrib>Lyu, Yunfei</creatorcontrib><creatorcontrib>Rehm, Heidi L</creatorcontrib><creatorcontrib>Neale, Benjamin M</creatorcontrib><creatorcontrib>Talkowski, Michael E</creatorcontrib><creatorcontrib>Daly, Mark J</creatorcontrib><creatorcontrib>Brand, Harrison</creatorcontrib><creatorcontrib>Karczewski, Konrad J</creatorcontrib><creatorcontrib>Atkinson, Elizabeth G</creatorcontrib><creatorcontrib>Martin, Alicia R</creatorcontrib><creatorcontrib>gnomAD Project Consortium</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Genome research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Koenig, Zan</au><au>Yohannes, Mary T</au><au>Nkambule, Lethukuthula L</au><au>Zhao, Xuefang</au><au>Goodrich, Julia K</au><au>Kim, Heesu Ally</au><au>Wilson, Michael W</au><au>Tiao, Grace</au><au>Hao, Stephanie P</au><au>Sahakian, Nareh</au><au>Chao, Katherine R</au><au>Walker, Mark A</au><au>Lyu, Yunfei</au><au>Rehm, Heidi L</au><au>Neale, Benjamin M</au><au>Talkowski, Michael E</au><au>Daly, Mark J</au><au>Brand, Harrison</au><au>Karczewski, Konrad J</au><au>Atkinson, Elizabeth G</au><au>Martin, Alicia R</au><aucorp>gnomAD Project Consortium</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A harmonized public resource of deeply sequenced diverse human genomes</atitle><jtitle>Genome research</jtitle><addtitle>Genome Res</addtitle><date>2024-05-01</date><risdate>2024</risdate><volume>34</volume><issue>5</issue><spage>796</spage><epage>809</epage><pages>796-809</pages><issn>1088-9051</issn><issn>1549-5469</issn><eissn>1549-5469</eissn><abstract>Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.</abstract><cop>United States</cop><pub>Cold Spring Harbor Laboratory Press</pub><pmid>38749656</pmid><doi>10.1101/gr.278378.123</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-0949-8752</orcidid><orcidid>https://orcid.org/0000-0002-6025-0015</orcidid><orcidid>https://orcid.org/0000-0003-1513-6077</orcidid><orcidid>https://orcid.org/0000-0003-0241-3522</orcidid><orcidid>https://orcid.org/0000-0003-2897-2410</orcidid><orcidid>https://orcid.org/0000-0002-6308-776X</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1088-9051 |
ispartof | Genome research, 2024-05, Vol.34 (5), p.796-809 |
issn | 1088-9051 1549-5469 1549-5469 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11216312 |
source | MEDLINE; PubMed Central; Alma/SFX Local Collection |
subjects | Databases, Genetic Genetic diversity Genetic Variation Genome, Human Genomes Genomics Genomics - methods Haplotypes High-Throughput Nucleotide Sequencing - methods Human Genome Project Humans Population structure Population studies Resource |
title | A harmonized public resource of deeply sequenced diverse human genomes |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T01%3A27%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20harmonized%20public%20resource%20of%20deeply%20sequenced%20diverse%20human%20genomes&rft.jtitle=Genome%20research&rft.au=Koenig,%20Zan&rft.aucorp=gnomAD%20Project%20Consortium&rft.date=2024-05-01&rft.volume=34&rft.issue=5&rft.spage=796&rft.epage=809&rft.pages=796-809&rft.issn=1088-9051&rft.eissn=1549-5469&rft_id=info:doi/10.1101/gr.278378.123&rft_dat=%3Cproquest_pubme%3E3076294468%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3076294468&rft_id=info:pmid/38749656&rfr_iscdi=true |