Detecting Identity by Descent and Estimating Genotype Error Rates in Sequence Data

Existing methods for identity by descent (IBD) segment detection were designed for SNP array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	American journal of human genetics 2013-11, Vol.93 (5), p.840-851
Hauptverfasser:	Browning, Brian L., Browning, Sharon R.
Format:	Artikel
Sprache:	eng
Schlagworte:	Alleles Cohort Studies Estimating techniques Europe European Continental Ancestry Group - genetics Gene Frequency Genetics, Population Genomes Genotype Genotype & phenotype Homozygote Humans Lod Score Models, Genetic Polymorphism, Single Nucleotide Probability Probability distribution Sequence Analysis, DNA - methods Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	851
container_issue	5
container_start_page	840
container_title	American journal of human genetics
container_volume	93
creator	Browning, Brian L. Browning, Sharon R.
description	Existing methods for identity by descent (IBD) segment detection were designed for SNP array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for IBD detection in SNP array data do not necessarily carry over to sequence data. We present a method, IBDseq, for detecting IBD segments in sequence data and a method, SEQERR, for estimating genotype error rates at low-frequency variants by using detected IBD. The IBDseq method estimates probabilities of genotypes observed with error for each pair of individuals under IBD and non-IBD models. The ratio of estimated probabilities under the two models gives a LOD score for IBD. We evaluate several IBD detection methods that are fast enough for application to sequence data (IBDseq, Beagle Refined IBD, PLINK, and GERMLINE) under multiple parameter settings, and we show that IBDseq achieves high power and accuracy for IBD detection in sequence data. The SEQERR method estimates genotype error rates by comparing observed and expected rates of pairs of homozygote and heterozygote genotypes at low-frequency variants in IBD segments. We demonstrate the accuracy of SEQERR in simulated data, and we apply the method to estimate genotype error rates in sequence data from the UK10K and 1000 Genomes projects.
doi_str_mv	10.1016/j.ajhg.2013.09.014
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3824133</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0002929713004540</els_id><sourcerecordid>3127581511</sourcerecordid><originalsourceid>FETCH-LOGICAL-c582t-bb26c0ee254957271a759850659e616317c2cec63fb64676698745fa45a302f63</originalsourceid><addsrcrecordid>eNqFkVFrFDEUhYModq3-AR8k4IsvM94kk2QGRCjdtS0UhKrPIZO5s82wO7Mm2cL-ezPdtqgP-hQu-e7hnHsIecugZMDUx6G0w-265MBECU0JrHpGFkwKXSgF8jlZAAAvGt7oE_IqxgGAsRrES3LCKw46Dwtys8SELvlxTa86HJNPB9oe6BKjyxO1Y0dXMfmtvUcucJzSYYd0FcIU6I1NGKkf6Tf8ucfRIV3aZF-TF73dRHzz8J6SH19W388vi-uvF1fnZ9eFkzVPRdty5QCRy6qRmmtmtWxqCUo2qJgSTDvu0CnRt6pSWqmm1pXsbSWtAN4rcUo-H3V3-3aL3ew32I3Zhew2HMxkvfnzZ_S3Zj3dGVHzigmRBT48CIQp-4_JbH2OvdnYEad9NEwypSVnrPo_WklguoFaZvT9X-gw7cOYLzFT9RyM60zxI-XCFGPA_sk3AzO3awYzt2vmdg00Bu5dvPs98dPKY50Z-HQEMN_9zmMw0fm5mM6H3LLpJv8v_V85J7Pv</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1458631727</pqid></control><display><type>article</type><title>Detecting Identity by Descent and Estimating Genotype Error Rates in Sequence Data</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals Complete</source><source>Cell Press Free Archives</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Browning, Brian L. ; Browning, Sharon R.</creator><creatorcontrib>Browning, Brian L. ; Browning, Sharon R.</creatorcontrib><description>Existing methods for identity by descent (IBD) segment detection were designed for SNP array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for IBD detection in SNP array data do not necessarily carry over to sequence data. We present a method, IBDseq, for detecting IBD segments in sequence data and a method, SEQERR, for estimating genotype error rates at low-frequency variants by using detected IBD. The IBDseq method estimates probabilities of genotypes observed with error for each pair of individuals under IBD and non-IBD models. The ratio of estimated probabilities under the two models gives a LOD score for IBD. We evaluate several IBD detection methods that are fast enough for application to sequence data (IBDseq, Beagle Refined IBD, PLINK, and GERMLINE) under multiple parameter settings, and we show that IBDseq achieves high power and accuracy for IBD detection in sequence data. The SEQERR method estimates genotype error rates by comparing observed and expected rates of pairs of homozygote and heterozygote genotypes at low-frequency variants in IBD segments. We demonstrate the accuracy of SEQERR in simulated data, and we apply the method to estimate genotype error rates in sequence data from the UK10K and 1000 Genomes projects.</description><identifier>ISSN: 0002-9297</identifier><identifier>EISSN: 1537-6605</identifier><identifier>DOI: 10.1016/j.ajhg.2013.09.014</identifier><identifier>PMID: 24207118</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Alleles ; Cohort Studies ; Estimating techniques ; Europe ; European Continental Ancestry Group - genetics ; Gene Frequency ; Genetics, Population ; Genomes ; Genotype ; Genotype & phenotype ; Homozygote ; Humans ; Lod Score ; Models, Genetic ; Polymorphism, Single Nucleotide ; Probability ; Probability distribution ; Sequence Analysis, DNA - methods ; Software</subject><ispartof>American journal of human genetics, 2013-11, Vol.93 (5), p.840-851</ispartof><rights>2013 The American Society of Human Genetics</rights><rights>Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.</rights><rights>Copyright Cell Press Nov 7, 2013</rights><rights>2013 The American Society of Human Genetics. Published by Elsevier Ltd. All right reserved. 2013 The American Society of Human Genetics</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c582t-bb26c0ee254957271a759850659e616317c2cec63fb64676698745fa45a302f63</citedby><cites>FETCH-LOGICAL-c582t-bb26c0ee254957271a759850659e616317c2cec63fb64676698745fa45a302f63</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3824133/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.ajhg.2013.09.014$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,3550,27924,27925,45995,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/24207118$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Browning, Brian L.</creatorcontrib><creatorcontrib>Browning, Sharon R.</creatorcontrib><title>Detecting Identity by Descent and Estimating Genotype Error Rates in Sequence Data</title><title>American journal of human genetics</title><addtitle>Am J Hum Genet</addtitle><description>Existing methods for identity by descent (IBD) segment detection were designed for SNP array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for IBD detection in SNP array data do not necessarily carry over to sequence data. We present a method, IBDseq, for detecting IBD segments in sequence data and a method, SEQERR, for estimating genotype error rates at low-frequency variants by using detected IBD. The IBDseq method estimates probabilities of genotypes observed with error for each pair of individuals under IBD and non-IBD models. The ratio of estimated probabilities under the two models gives a LOD score for IBD. We evaluate several IBD detection methods that are fast enough for application to sequence data (IBDseq, Beagle Refined IBD, PLINK, and GERMLINE) under multiple parameter settings, and we show that IBDseq achieves high power and accuracy for IBD detection in sequence data. The SEQERR method estimates genotype error rates by comparing observed and expected rates of pairs of homozygote and heterozygote genotypes at low-frequency variants in IBD segments. We demonstrate the accuracy of SEQERR in simulated data, and we apply the method to estimate genotype error rates in sequence data from the UK10K and 1000 Genomes projects.</description><subject>Alleles</subject><subject>Cohort Studies</subject><subject>Estimating techniques</subject><subject>Europe</subject><subject>European Continental Ancestry Group - genetics</subject><subject>Gene Frequency</subject><subject>Genetics, Population</subject><subject>Genomes</subject><subject>Genotype</subject><subject>Genotype & phenotype</subject><subject>Homozygote</subject><subject>Humans</subject><subject>Lod Score</subject><subject>Models, Genetic</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Probability</subject><subject>Probability distribution</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Software</subject><issn>0002-9297</issn><issn>1537-6605</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkVFrFDEUhYModq3-AR8k4IsvM94kk2QGRCjdtS0UhKrPIZO5s82wO7Mm2cL-ezPdtqgP-hQu-e7hnHsIecugZMDUx6G0w-265MBECU0JrHpGFkwKXSgF8jlZAAAvGt7oE_IqxgGAsRrES3LCKw46Dwtys8SELvlxTa86HJNPB9oe6BKjyxO1Y0dXMfmtvUcucJzSYYd0FcIU6I1NGKkf6Tf8ucfRIV3aZF-TF73dRHzz8J6SH19W388vi-uvF1fnZ9eFkzVPRdty5QCRy6qRmmtmtWxqCUo2qJgSTDvu0CnRt6pSWqmm1pXsbSWtAN4rcUo-H3V3-3aL3ew32I3Zhew2HMxkvfnzZ_S3Zj3dGVHzigmRBT48CIQp-4_JbH2OvdnYEad9NEwypSVnrPo_WklguoFaZvT9X-gw7cOYLzFT9RyM60zxI-XCFGPA_sk3AzO3awYzt2vmdg00Bu5dvPs98dPKY50Z-HQEMN_9zmMw0fm5mM6H3LLpJv8v_V85J7Pv</recordid><startdate>20131107</startdate><enddate>20131107</enddate><creator>Browning, Brian L.</creator><creator>Browning, Sharon R.</creator><general>Elsevier Inc</general><general>Cell Press</general><general>Elsevier</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7U7</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>K9.</scope><scope>NAPCQ</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20131107</creationdate><title>Detecting Identity by Descent and Estimating Genotype Error Rates in Sequence Data</title><author>Browning, Brian L. ; Browning, Sharon R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c582t-bb26c0ee254957271a759850659e616317c2cec63fb64676698745fa45a302f63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Alleles</topic><topic>Cohort Studies</topic><topic>Estimating techniques</topic><topic>Europe</topic><topic>European Continental Ancestry Group - genetics</topic><topic>Gene Frequency</topic><topic>Genetics, Population</topic><topic>Genomes</topic><topic>Genotype</topic><topic>Genotype & phenotype</topic><topic>Homozygote</topic><topic>Humans</topic><topic>Lod Score</topic><topic>Models, Genetic</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Probability</topic><topic>Probability distribution</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Browning, Brian L.</creatorcontrib><creatorcontrib>Browning, Sharon R.</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Nursing & Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>American journal of human genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Browning, Brian L.</au><au>Browning, Sharon R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Detecting Identity by Descent and Estimating Genotype Error Rates in Sequence Data</atitle><jtitle>American journal of human genetics</jtitle><addtitle>Am J Hum Genet</addtitle><date>2013-11-07</date><risdate>2013</risdate><volume>93</volume><issue>5</issue><spage>840</spage><epage>851</epage><pages>840-851</pages><issn>0002-9297</issn><eissn>1537-6605</eissn><abstract>Existing methods for identity by descent (IBD) segment detection were designed for SNP array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for IBD detection in SNP array data do not necessarily carry over to sequence data. We present a method, IBDseq, for detecting IBD segments in sequence data and a method, SEQERR, for estimating genotype error rates at low-frequency variants by using detected IBD. The IBDseq method estimates probabilities of genotypes observed with error for each pair of individuals under IBD and non-IBD models. The ratio of estimated probabilities under the two models gives a LOD score for IBD. We evaluate several IBD detection methods that are fast enough for application to sequence data (IBDseq, Beagle Refined IBD, PLINK, and GERMLINE) under multiple parameter settings, and we show that IBDseq achieves high power and accuracy for IBD detection in sequence data. The SEQERR method estimates genotype error rates by comparing observed and expected rates of pairs of homozygote and heterozygote genotypes at low-frequency variants in IBD segments. We demonstrate the accuracy of SEQERR in simulated data, and we apply the method to estimate genotype error rates in sequence data from the UK10K and 1000 Genomes projects.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>24207118</pmid><doi>10.1016/j.ajhg.2013.09.014</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0002-9297
ispartof	American journal of human genetics, 2013-11, Vol.93 (5), p.840-851
issn	0002-9297 1537-6605
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3824133
source	MEDLINE; Elsevier ScienceDirect Journals Complete; Cell Press Free Archives; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects	Alleles Cohort Studies Estimating techniques Europe European Continental Ancestry Group - genetics Gene Frequency Genetics, Population Genomes Genotype Genotype & phenotype Homozygote Humans Lod Score Models, Genetic Polymorphism, Single Nucleotide Probability Probability distribution Sequence Analysis, DNA - methods Software
title	Detecting Identity by Descent and Estimating Genotype Error Rates in Sequence Data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T06%3A24%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Detecting%20Identity%20by%20Descent%20and%20Estimating%20Genotype%20Error%20Rates%20in%20Sequence%20Data&rft.jtitle=American%20journal%20of%20human%20genetics&rft.au=Browning,%20Brian%C2%A0L.&rft.date=2013-11-07&rft.volume=93&rft.issue=5&rft.spage=840&rft.epage=851&rft.pages=840-851&rft.issn=0002-9297&rft.eissn=1537-6605&rft_id=info:doi/10.1016/j.ajhg.2013.09.014&rft_dat=%3Cproquest_pubme%3E3127581511%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1458631727&rft_id=info:pmid/24207118&rft_els_id=S0002929713004540&rfr_iscdi=true