Identifying crossovers and shared genetic material in whole genome sequencing data from families

Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Genome research 2023-10, Vol.33 (10), p.1747-1756
Hauptverfasser: Paskov, Kelley, Chrisman, Brianna, Stockham, Nathaniel, Washington, Peter Yigitcan, Dunlap, Kaitlyn, Jung, Jae-Yoon, Wall, Dennis P
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1756
container_issue 10
container_start_page 1747
container_title Genome research
container_volume 33
creator Paskov, Kelley
Chrisman, Brianna
Stockham, Nathaniel
Washington, Peter Yigitcan
Dunlap, Kaitlyn
Jung, Jae-Yoon
Wall, Dennis P
description Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special inheritance status of the X Chromosome presents challenges. We have developed a hidden Markov model that addresses these issues by modeling the inheritance of variants in families in the presence of error-prone regions and inherited deletions. We call our method PhasingFamilies. We validate PhasingFamilies using the platinum genome family NA1281 (precision: 0.81; recall: 0.97), as well as simulated genomes with known crossover positions (precision: 0.93; recall: 0.92). Using 1925 quads from the Simons Simplex Collection, we found that PhasingFamilies resolves crossovers to a median resolution of 3527.5 bp. These crossovers recapitulate existing recombination rate maps, including for the X Chromosome; produce sibling pair IBD that matches expected distributions; and are validated by the haplotype estimation tool SHAPEIT. We provide an efficient, open-source implementation of PhasingFamilies that can be used to identify crossovers from family sequencing data.
doi_str_mv 10.1101/gr.277172.122
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_10691535</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2882321739</sourcerecordid><originalsourceid>FETCH-LOGICAL-c377t-2f6669200821bb5b4d997bd1047431db4470c39aab724a83143daf49aa506a003</originalsourceid><addsrcrecordid>eNpdkUtP3DAURq0KVF5dskWW2HSTwa_4sUIVagEJqZt2bW5iJ2OU2GBnqPj39TAUtV3Z8j365HM_hE4pWVFK6MWYV0wpqtiKMvYBHdJWmKYV0uzVO9G6MaSlB-iolAdCCBdaf0QHXGlltKSH6P7W-biE4SXEEfc5lZKefS4YosNlDdk7PProl9DjGRafA0w4RPxrnSa_naTZ4-KfNj722wQHC-AhpxkPMIcp-HKC9geYiv_0dh6jn9--_ri6ae6-X99efblreq7U0rBBSmkYIZrRrms74YxRnaNEKMGp64RQpOcGoFNMgOZUcAeDqA8tkVDFjtHlLvdx083e9dUqw2Qfc5ghv9gEwf47iWFtx_RsKZGGtrytCZ_fEnKqQmWxcyi9nyaIPm2KZVozzqjipqLn_6EPaZNj9dtSRmoliKxUs6Ne95r98P4bSuy2PDtmuyvP1vIqf_a3wjv9py3-G8OHlds</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2889687406</pqid></control><display><type>article</type><title>Identifying crossovers and shared genetic material in whole genome sequencing data from families</title><source>MEDLINE</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Paskov, Kelley ; Chrisman, Brianna ; Stockham, Nathaniel ; Washington, Peter Yigitcan ; Dunlap, Kaitlyn ; Jung, Jae-Yoon ; Wall, Dennis P</creator><creatorcontrib>Paskov, Kelley ; Chrisman, Brianna ; Stockham, Nathaniel ; Washington, Peter Yigitcan ; Dunlap, Kaitlyn ; Jung, Jae-Yoon ; Wall, Dennis P</creatorcontrib><description>Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special inheritance status of the X Chromosome presents challenges. We have developed a hidden Markov model that addresses these issues by modeling the inheritance of variants in families in the presence of error-prone regions and inherited deletions. We call our method PhasingFamilies. We validate PhasingFamilies using the platinum genome family NA1281 (precision: 0.81; recall: 0.97), as well as simulated genomes with known crossover positions (precision: 0.93; recall: 0.92). Using 1925 quads from the Simons Simplex Collection, we found that PhasingFamilies resolves crossovers to a median resolution of 3527.5 bp. These crossovers recapitulate existing recombination rate maps, including for the X Chromosome; produce sibling pair IBD that matches expected distributions; and are validated by the haplotype estimation tool SHAPEIT. We provide an efficient, open-source implementation of PhasingFamilies that can be used to identify crossovers from family sequencing data.</description><identifier>ISSN: 1088-9051</identifier><identifier>EISSN: 1549-5469</identifier><identifier>DOI: 10.1101/gr.277172.122</identifier><identifier>PMID: 37879861</identifier><language>eng</language><publisher>United States: Cold Spring Harbor Laboratory Press</publisher><subject>Gene mapping ; Genome ; Genomes ; Haplotypes ; Heredity ; Humans ; Inheritance Patterns ; Markov chains ; Methods ; Siblings ; Whole Genome Sequencing ; X chromosomes</subject><ispartof>Genome research, 2023-10, Vol.33 (10), p.1747-1756</ispartof><rights>2023 Paskov et al.; Published by Cold Spring Harbor Laboratory Press.</rights><rights>Copyright Cold Spring Harbor Laboratory Press Oct 2023</rights><rights>2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c377t-2f6669200821bb5b4d997bd1047431db4470c39aab724a83143daf49aa506a003</citedby><cites>FETCH-LOGICAL-c377t-2f6669200821bb5b4d997bd1047431db4470c39aab724a83143daf49aa506a003</cites><orcidid>0000-0003-3276-4411 ; 0000-0002-7889-9146 ; 0000-0002-7157-607X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691535/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691535/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37879861$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Paskov, Kelley</creatorcontrib><creatorcontrib>Chrisman, Brianna</creatorcontrib><creatorcontrib>Stockham, Nathaniel</creatorcontrib><creatorcontrib>Washington, Peter Yigitcan</creatorcontrib><creatorcontrib>Dunlap, Kaitlyn</creatorcontrib><creatorcontrib>Jung, Jae-Yoon</creatorcontrib><creatorcontrib>Wall, Dennis P</creatorcontrib><title>Identifying crossovers and shared genetic material in whole genome sequencing data from families</title><title>Genome research</title><addtitle>Genome Res</addtitle><description>Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special inheritance status of the X Chromosome presents challenges. We have developed a hidden Markov model that addresses these issues by modeling the inheritance of variants in families in the presence of error-prone regions and inherited deletions. We call our method PhasingFamilies. We validate PhasingFamilies using the platinum genome family NA1281 (precision: 0.81; recall: 0.97), as well as simulated genomes with known crossover positions (precision: 0.93; recall: 0.92). Using 1925 quads from the Simons Simplex Collection, we found that PhasingFamilies resolves crossovers to a median resolution of 3527.5 bp. These crossovers recapitulate existing recombination rate maps, including for the X Chromosome; produce sibling pair IBD that matches expected distributions; and are validated by the haplotype estimation tool SHAPEIT. We provide an efficient, open-source implementation of PhasingFamilies that can be used to identify crossovers from family sequencing data.</description><subject>Gene mapping</subject><subject>Genome</subject><subject>Genomes</subject><subject>Haplotypes</subject><subject>Heredity</subject><subject>Humans</subject><subject>Inheritance Patterns</subject><subject>Markov chains</subject><subject>Methods</subject><subject>Siblings</subject><subject>Whole Genome Sequencing</subject><subject>X chromosomes</subject><issn>1088-9051</issn><issn>1549-5469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkUtP3DAURq0KVF5dskWW2HSTwa_4sUIVagEJqZt2bW5iJ2OU2GBnqPj39TAUtV3Z8j365HM_hE4pWVFK6MWYV0wpqtiKMvYBHdJWmKYV0uzVO9G6MaSlB-iolAdCCBdaf0QHXGlltKSH6P7W-biE4SXEEfc5lZKefS4YosNlDdk7PProl9DjGRafA0w4RPxrnSa_naTZ4-KfNj722wQHC-AhpxkPMIcp-HKC9geYiv_0dh6jn9--_ri6ae6-X99efblreq7U0rBBSmkYIZrRrms74YxRnaNEKMGp64RQpOcGoFNMgOZUcAeDqA8tkVDFjtHlLvdx083e9dUqw2Qfc5ghv9gEwf47iWFtx_RsKZGGtrytCZ_fEnKqQmWxcyi9nyaIPm2KZVozzqjipqLn_6EPaZNj9dtSRmoliKxUs6Ne95r98P4bSuy2PDtmuyvP1vIqf_a3wjv9py3-G8OHlds</recordid><startdate>202310</startdate><enddate>202310</enddate><creator>Paskov, Kelley</creator><creator>Chrisman, Brianna</creator><creator>Stockham, Nathaniel</creator><creator>Washington, Peter Yigitcan</creator><creator>Dunlap, Kaitlyn</creator><creator>Jung, Jae-Yoon</creator><creator>Wall, Dennis P</creator><general>Cold Spring Harbor Laboratory Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-3276-4411</orcidid><orcidid>https://orcid.org/0000-0002-7889-9146</orcidid><orcidid>https://orcid.org/0000-0002-7157-607X</orcidid></search><sort><creationdate>202310</creationdate><title>Identifying crossovers and shared genetic material in whole genome sequencing data from families</title><author>Paskov, Kelley ; Chrisman, Brianna ; Stockham, Nathaniel ; Washington, Peter Yigitcan ; Dunlap, Kaitlyn ; Jung, Jae-Yoon ; Wall, Dennis P</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c377t-2f6669200821bb5b4d997bd1047431db4470c39aab724a83143daf49aa506a003</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Gene mapping</topic><topic>Genome</topic><topic>Genomes</topic><topic>Haplotypes</topic><topic>Heredity</topic><topic>Humans</topic><topic>Inheritance Patterns</topic><topic>Markov chains</topic><topic>Methods</topic><topic>Siblings</topic><topic>Whole Genome Sequencing</topic><topic>X chromosomes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Paskov, Kelley</creatorcontrib><creatorcontrib>Chrisman, Brianna</creatorcontrib><creatorcontrib>Stockham, Nathaniel</creatorcontrib><creatorcontrib>Washington, Peter Yigitcan</creatorcontrib><creatorcontrib>Dunlap, Kaitlyn</creatorcontrib><creatorcontrib>Jung, Jae-Yoon</creatorcontrib><creatorcontrib>Wall, Dennis P</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Genome research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Paskov, Kelley</au><au>Chrisman, Brianna</au><au>Stockham, Nathaniel</au><au>Washington, Peter Yigitcan</au><au>Dunlap, Kaitlyn</au><au>Jung, Jae-Yoon</au><au>Wall, Dennis P</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identifying crossovers and shared genetic material in whole genome sequencing data from families</atitle><jtitle>Genome research</jtitle><addtitle>Genome Res</addtitle><date>2023-10</date><risdate>2023</risdate><volume>33</volume><issue>10</issue><spage>1747</spage><epage>1756</epage><pages>1747-1756</pages><issn>1088-9051</issn><eissn>1549-5469</eissn><abstract>Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special inheritance status of the X Chromosome presents challenges. We have developed a hidden Markov model that addresses these issues by modeling the inheritance of variants in families in the presence of error-prone regions and inherited deletions. We call our method PhasingFamilies. We validate PhasingFamilies using the platinum genome family NA1281 (precision: 0.81; recall: 0.97), as well as simulated genomes with known crossover positions (precision: 0.93; recall: 0.92). Using 1925 quads from the Simons Simplex Collection, we found that PhasingFamilies resolves crossovers to a median resolution of 3527.5 bp. These crossovers recapitulate existing recombination rate maps, including for the X Chromosome; produce sibling pair IBD that matches expected distributions; and are validated by the haplotype estimation tool SHAPEIT. We provide an efficient, open-source implementation of PhasingFamilies that can be used to identify crossovers from family sequencing data.</abstract><cop>United States</cop><pub>Cold Spring Harbor Laboratory Press</pub><pmid>37879861</pmid><doi>10.1101/gr.277172.122</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-3276-4411</orcidid><orcidid>https://orcid.org/0000-0002-7889-9146</orcidid><orcidid>https://orcid.org/0000-0002-7157-607X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1088-9051
ispartof Genome research, 2023-10, Vol.33 (10), p.1747-1756
issn 1088-9051
1549-5469
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_10691535
source MEDLINE; PubMed Central; Alma/SFX Local Collection
subjects Gene mapping
Genome
Genomes
Haplotypes
Heredity
Humans
Inheritance Patterns
Markov chains
Methods
Siblings
Whole Genome Sequencing
X chromosomes
title Identifying crossovers and shared genetic material in whole genome sequencing data from families
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T05%3A08%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identifying%20crossovers%20and%20shared%20genetic%20material%20in%20whole%20genome%20sequencing%20data%20from%20families&rft.jtitle=Genome%20research&rft.au=Paskov,%20Kelley&rft.date=2023-10&rft.volume=33&rft.issue=10&rft.spage=1747&rft.epage=1756&rft.pages=1747-1756&rft.issn=1088-9051&rft.eissn=1549-5469&rft_id=info:doi/10.1101/gr.277172.122&rft_dat=%3Cproquest_pubme%3E2882321739%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2889687406&rft_id=info:pmid/37879861&rfr_iscdi=true