Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy

RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Molecular ecology resources 2015-03, Vol.15 (2), p.329-336
Hauptverfasser:	Tin, M. M. Y., Rheindt, F. E., Cros, E., Mikheyev, A. S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Animals DNA Primers - genetics Genotype Genotype & phenotype genotyping Genotyping Techniques - methods Hymenoptera - genetics methodology next-generation sequencing Polymerase Chain Reaction - methods RAD-seq RAD-tag Saccharomyces cerevisiae - genetics Sequence Analysis, DNA - methods
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	336
container_issue	2
container_start_page	329
container_title	Molecular ecology resources
container_volume	15
creator	Tin, M. M. Y. Rheindt, F. E. Cros, E. Mikheyev, A. S.
description	RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material.
doi_str_mv	10.1111/1755-0998.12314
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1655522355</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1655522355</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5454-e862057ddd5a493e2ecb8d79c1ea75bbabf86968911f1105c86339485839e1123</originalsourceid><addsrcrecordid>eNqFkU9v1DAQxSMEon_gzA1Z4sIlbRxnYvuIltIiLS2tWtGb5diTyiWbpLZD2Q_A98Zhu3vgUl_GY__e88gvy97R4oimdUw5QF5IKY5oyWj1Itvfnbzc7cXtXnYQwn1R1IXk1etsrwTKSuBiP_vzGe-wR68jEm31GAdPAj5M2BsMpE2dxYgmuv6OfF9cETuNnTOJDsT1xKOdDNpUR48B-6ijG_qtwayxOmriVqMffiFJLw1xPSIxuuvmW23M5LVZv8letboL-PapHmY3X06uF2f58uL06-LTMjdQQZWjqMsCuLUWdCUZlmgaYbk0FDWHptFNK2pZC0lpS2kBRtSMyUqAYBJp-qHD7OPGN82TRgxRrVww2HW6x2EKitYAUJYMIKEf_kPvh8n3abqZqrgUZS0SdbyhjB9C8Niq0buV9mtFCzUnpOYM1JyH-pdQUrx_8p2aFdodv40kAbABHl2H6-f81LeT861xvtG5EPH3Tqf9T1VzxkH9OD9VIC6v-dnyVl2yvxjCq4s</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1654798268</pqid></control><display><type>article</type><title>Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy</title><source>MEDLINE</source><source>Wiley Online Library Journals Frontfile Complete</source><creator>Tin, M. M. Y. ; Rheindt, F. E. ; Cros, E. ; Mikheyev, A. S.</creator><creatorcontrib>Tin, M. M. Y. ; Rheindt, F. E. ; Cros, E. ; Mikheyev, A. S.</creatorcontrib><description>RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material.</description><identifier>ISSN: 1755-098X</identifier><identifier>EISSN: 1755-0998</identifier><identifier>DOI: 10.1111/1755-0998.12314</identifier><identifier>PMID: 25132578</identifier><language>eng</language><publisher>England: Blackwell Publishing Ltd</publisher><subject>Accuracy ; Animals ; DNA Primers - genetics ; Genotype ; Genotype & phenotype ; genotyping ; Genotyping Techniques - methods ; Hymenoptera - genetics ; methodology ; next-generation sequencing ; Polymerase Chain Reaction - methods ; RAD-seq ; RAD-tag ; Saccharomyces cerevisiae - genetics ; Sequence Analysis, DNA - methods</subject><ispartof>Molecular ecology resources, 2015-03, Vol.15 (2), p.329-336</ispartof><rights>2014 John Wiley & Sons Ltd</rights><rights>2014 John Wiley & Sons Ltd.</rights><rights>Copyright © 2015 John Wiley & Sons Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5454-e862057ddd5a493e2ecb8d79c1ea75bbabf86968911f1105c86339485839e1123</citedby><cites>FETCH-LOGICAL-c5454-e862057ddd5a493e2ecb8d79c1ea75bbabf86968911f1105c86339485839e1123</cites><orcidid>0000-0003-4369-1019</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2F1755-0998.12314$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2F1755-0998.12314$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25132578$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Tin, M. M. Y.</creatorcontrib><creatorcontrib>Rheindt, F. E.</creatorcontrib><creatorcontrib>Cros, E.</creatorcontrib><creatorcontrib>Mikheyev, A. S.</creatorcontrib><title>Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy</title><title>Molecular ecology resources</title><addtitle>Mol Ecol Resour</addtitle><description>RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material.</description><subject>Accuracy</subject><subject>Animals</subject><subject>DNA Primers - genetics</subject><subject>Genotype</subject><subject>Genotype & phenotype</subject><subject>genotyping</subject><subject>Genotyping Techniques - methods</subject><subject>Hymenoptera - genetics</subject><subject>methodology</subject><subject>next-generation sequencing</subject><subject>Polymerase Chain Reaction - methods</subject><subject>RAD-seq</subject><subject>RAD-tag</subject><subject>Saccharomyces cerevisiae - genetics</subject><subject>Sequence Analysis, DNA - methods</subject><issn>1755-098X</issn><issn>1755-0998</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkU9v1DAQxSMEon_gzA1Z4sIlbRxnYvuIltIiLS2tWtGb5diTyiWbpLZD2Q_A98Zhu3vgUl_GY__e88gvy97R4oimdUw5QF5IKY5oyWj1Itvfnbzc7cXtXnYQwn1R1IXk1etsrwTKSuBiP_vzGe-wR68jEm31GAdPAj5M2BsMpE2dxYgmuv6OfF9cETuNnTOJDsT1xKOdDNpUR48B-6ijG_qtwayxOmriVqMffiFJLw1xPSIxuuvmW23M5LVZv8letboL-PapHmY3X06uF2f58uL06-LTMjdQQZWjqMsCuLUWdCUZlmgaYbk0FDWHptFNK2pZC0lpS2kBRtSMyUqAYBJp-qHD7OPGN82TRgxRrVww2HW6x2EKitYAUJYMIKEf_kPvh8n3abqZqrgUZS0SdbyhjB9C8Niq0buV9mtFCzUnpOYM1JyH-pdQUrx_8p2aFdodv40kAbABHl2H6-f81LeT861xvtG5EPH3Tqf9T1VzxkH9OD9VIC6v-dnyVl2yvxjCq4s</recordid><startdate>201503</startdate><enddate>201503</enddate><creator>Tin, M. M. Y.</creator><creator>Rheindt, F. E.</creator><creator>Cros, E.</creator><creator>Mikheyev, A. S.</creator><general>Blackwell Publishing Ltd</general><general>Wiley Subscription Services, Inc</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SN</scope><scope>7SS</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-4369-1019</orcidid></search><sort><creationdate>201503</creationdate><title>Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy</title><author>Tin, M. M. Y. ; Rheindt, F. E. ; Cros, E. ; Mikheyev, A. S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5454-e862057ddd5a493e2ecb8d79c1ea75bbabf86968911f1105c86339485839e1123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Accuracy</topic><topic>Animals</topic><topic>DNA Primers - genetics</topic><topic>Genotype</topic><topic>Genotype & phenotype</topic><topic>genotyping</topic><topic>Genotyping Techniques - methods</topic><topic>Hymenoptera - genetics</topic><topic>methodology</topic><topic>next-generation sequencing</topic><topic>Polymerase Chain Reaction - methods</topic><topic>RAD-seq</topic><topic>RAD-tag</topic><topic>Saccharomyces cerevisiae - genetics</topic><topic>Sequence Analysis, DNA - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tin, M. M. Y.</creatorcontrib><creatorcontrib>Rheindt, F. E.</creatorcontrib><creatorcontrib>Cros, E.</creatorcontrib><creatorcontrib>Mikheyev, A. S.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Molecular ecology resources</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tin, M. M. Y.</au><au>Rheindt, F. E.</au><au>Cros, E.</au><au>Mikheyev, A. S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy</atitle><jtitle>Molecular ecology resources</jtitle><addtitle>Mol Ecol Resour</addtitle><date>2015-03</date><risdate>2015</risdate><volume>15</volume><issue>2</issue><spage>329</spage><epage>336</epage><pages>329-336</pages><issn>1755-098X</issn><eissn>1755-0998</eissn><abstract>RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material.</abstract><cop>England</cop><pub>Blackwell Publishing Ltd</pub><pmid>25132578</pmid><doi>10.1111/1755-0998.12314</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0003-4369-1019</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1755-098X
ispartof	Molecular ecology resources, 2015-03, Vol.15 (2), p.329-336
issn	1755-098X 1755-0998
language	eng
recordid	cdi_proquest_miscellaneous_1655522355
source	MEDLINE; Wiley Online Library Journals Frontfile Complete
subjects	Accuracy Animals DNA Primers - genetics Genotype Genotype & phenotype genotyping Genotyping Techniques - methods Hymenoptera - genetics methodology next-generation sequencing Polymerase Chain Reaction - methods RAD-seq RAD-tag Saccharomyces cerevisiae - genetics Sequence Analysis, DNA - methods
title	Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T22%3A57%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Degenerate%20adaptor%20sequences%20for%20detecting%20PCR%20duplicates%20in%20reduced%20representation%20sequencing%20data%20improve%20genotype%20calling%20accuracy&rft.jtitle=Molecular%20ecology%20resources&rft.au=Tin,%20M.%20M.%20Y.&rft.date=2015-03&rft.volume=15&rft.issue=2&rft.spage=329&rft.epage=336&rft.pages=329-336&rft.issn=1755-098X&rft.eissn=1755-0998&rft_id=info:doi/10.1111/1755-0998.12314&rft_dat=%3Cproquest_cross%3E1655522355%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1654798268&rft_id=info:pmid/25132578&rfr_iscdi=true