Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy

RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular ecology resources 2015-03, Vol.15 (2), p.329-336
Hauptverfasser: Tin, M. M. Y., Rheindt, F. E., Cros, E., Mikheyev, A. S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 336
container_issue 2
container_start_page 329
container_title Molecular ecology resources
container_volume 15
creator Tin, M. M. Y.
Rheindt, F. E.
Cros, E.
Mikheyev, A. S.
description RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material.
doi_str_mv 10.1111/1755-0998.12314
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1655522355</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1655522355</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5454-e862057ddd5a493e2ecb8d79c1ea75bbabf86968911f1105c86339485839e1123</originalsourceid><addsrcrecordid>eNqFkU9v1DAQxSMEon_gzA1Z4sIlbRxnYvuIltIiLS2tWtGb5diTyiWbpLZD2Q_A98Zhu3vgUl_GY__e88gvy97R4oimdUw5QF5IKY5oyWj1Itvfnbzc7cXtXnYQwn1R1IXk1etsrwTKSuBiP_vzGe-wR68jEm31GAdPAj5M2BsMpE2dxYgmuv6OfF9cETuNnTOJDsT1xKOdDNpUR48B-6ijG_qtwayxOmriVqMffiFJLw1xPSIxuuvmW23M5LVZv8letboL-PapHmY3X06uF2f58uL06-LTMjdQQZWjqMsCuLUWdCUZlmgaYbk0FDWHptFNK2pZC0lpS2kBRtSMyUqAYBJp-qHD7OPGN82TRgxRrVww2HW6x2EKitYAUJYMIKEf_kPvh8n3abqZqrgUZS0SdbyhjB9C8Niq0buV9mtFCzUnpOYM1JyH-pdQUrx_8p2aFdodv40kAbABHl2H6-f81LeT861xvtG5EPH3Tqf9T1VzxkH9OD9VIC6v-dnyVl2yvxjCq4s</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1654798268</pqid></control><display><type>article</type><title>Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy</title><source>MEDLINE</source><source>Wiley Online Library Journals Frontfile Complete</source><creator>Tin, M. M. Y. ; Rheindt, F. E. ; Cros, E. ; Mikheyev, A. S.</creator><creatorcontrib>Tin, M. M. Y. ; Rheindt, F. E. ; Cros, E. ; Mikheyev, A. S.</creatorcontrib><description>RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material.</description><identifier>ISSN: 1755-098X</identifier><identifier>EISSN: 1755-0998</identifier><identifier>DOI: 10.1111/1755-0998.12314</identifier><identifier>PMID: 25132578</identifier><language>eng</language><publisher>England: Blackwell Publishing Ltd</publisher><subject>Accuracy ; Animals ; DNA Primers - genetics ; Genotype ; Genotype &amp; phenotype ; genotyping ; Genotyping Techniques - methods ; Hymenoptera - genetics ; methodology ; next-generation sequencing ; Polymerase Chain Reaction - methods ; RAD-seq ; RAD-tag ; Saccharomyces cerevisiae - genetics ; Sequence Analysis, DNA - methods</subject><ispartof>Molecular ecology resources, 2015-03, Vol.15 (2), p.329-336</ispartof><rights>2014 John Wiley &amp; Sons Ltd</rights><rights>2014 John Wiley &amp; Sons Ltd.</rights><rights>Copyright © 2015 John Wiley &amp; Sons Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5454-e862057ddd5a493e2ecb8d79c1ea75bbabf86968911f1105c86339485839e1123</citedby><cites>FETCH-LOGICAL-c5454-e862057ddd5a493e2ecb8d79c1ea75bbabf86968911f1105c86339485839e1123</cites><orcidid>0000-0003-4369-1019</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2F1755-0998.12314$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2F1755-0998.12314$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25132578$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Tin, M. M. Y.</creatorcontrib><creatorcontrib>Rheindt, F. E.</creatorcontrib><creatorcontrib>Cros, E.</creatorcontrib><creatorcontrib>Mikheyev, A. S.</creatorcontrib><title>Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy</title><title>Molecular ecology resources</title><addtitle>Mol Ecol Resour</addtitle><description>RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material.</description><subject>Accuracy</subject><subject>Animals</subject><subject>DNA Primers - genetics</subject><subject>Genotype</subject><subject>Genotype &amp; phenotype</subject><subject>genotyping</subject><subject>Genotyping Techniques - methods</subject><subject>Hymenoptera - genetics</subject><subject>methodology</subject><subject>next-generation sequencing</subject><subject>Polymerase Chain Reaction - methods</subject><subject>RAD-seq</subject><subject>RAD-tag</subject><subject>Saccharomyces cerevisiae - genetics</subject><subject>Sequence Analysis, DNA - methods</subject><issn>1755-098X</issn><issn>1755-0998</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkU9v1DAQxSMEon_gzA1Z4sIlbRxnYvuIltIiLS2tWtGb5diTyiWbpLZD2Q_A98Zhu3vgUl_GY__e88gvy97R4oimdUw5QF5IKY5oyWj1Itvfnbzc7cXtXnYQwn1R1IXk1etsrwTKSuBiP_vzGe-wR68jEm31GAdPAj5M2BsMpE2dxYgmuv6OfF9cETuNnTOJDsT1xKOdDNpUR48B-6ijG_qtwayxOmriVqMffiFJLw1xPSIxuuvmW23M5LVZv8letboL-PapHmY3X06uF2f58uL06-LTMjdQQZWjqMsCuLUWdCUZlmgaYbk0FDWHptFNK2pZC0lpS2kBRtSMyUqAYBJp-qHD7OPGN82TRgxRrVww2HW6x2EKitYAUJYMIKEf_kPvh8n3abqZqrgUZS0SdbyhjB9C8Niq0buV9mtFCzUnpOYM1JyH-pdQUrx_8p2aFdodv40kAbABHl2H6-f81LeT861xvtG5EPH3Tqf9T1VzxkH9OD9VIC6v-dnyVl2yvxjCq4s</recordid><startdate>201503</startdate><enddate>201503</enddate><creator>Tin, M. M. Y.</creator><creator>Rheindt, F. E.</creator><creator>Cros, E.</creator><creator>Mikheyev, A. S.</creator><general>Blackwell Publishing Ltd</general><general>Wiley Subscription Services, Inc</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SN</scope><scope>7SS</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-4369-1019</orcidid></search><sort><creationdate>201503</creationdate><title>Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy</title><author>Tin, M. M. Y. ; Rheindt, F. E. ; Cros, E. ; Mikheyev, A. S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5454-e862057ddd5a493e2ecb8d79c1ea75bbabf86968911f1105c86339485839e1123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Accuracy</topic><topic>Animals</topic><topic>DNA Primers - genetics</topic><topic>Genotype</topic><topic>Genotype &amp; phenotype</topic><topic>genotyping</topic><topic>Genotyping Techniques - methods</topic><topic>Hymenoptera - genetics</topic><topic>methodology</topic><topic>next-generation sequencing</topic><topic>Polymerase Chain Reaction - methods</topic><topic>RAD-seq</topic><topic>RAD-tag</topic><topic>Saccharomyces cerevisiae - genetics</topic><topic>Sequence Analysis, DNA - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tin, M. M. Y.</creatorcontrib><creatorcontrib>Rheindt, F. E.</creatorcontrib><creatorcontrib>Cros, E.</creatorcontrib><creatorcontrib>Mikheyev, A. S.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Molecular ecology resources</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tin, M. M. Y.</au><au>Rheindt, F. E.</au><au>Cros, E.</au><au>Mikheyev, A. S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy</atitle><jtitle>Molecular ecology resources</jtitle><addtitle>Mol Ecol Resour</addtitle><date>2015-03</date><risdate>2015</risdate><volume>15</volume><issue>2</issue><spage>329</spage><epage>336</epage><pages>329-336</pages><issn>1755-098X</issn><eissn>1755-0998</eissn><abstract>RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material.</abstract><cop>England</cop><pub>Blackwell Publishing Ltd</pub><pmid>25132578</pmid><doi>10.1111/1755-0998.12314</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0003-4369-1019</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1755-098X
ispartof Molecular ecology resources, 2015-03, Vol.15 (2), p.329-336
issn 1755-098X
1755-0998
language eng
recordid cdi_proquest_miscellaneous_1655522355
source MEDLINE; Wiley Online Library Journals Frontfile Complete
subjects Accuracy
Animals
DNA Primers - genetics
Genotype
Genotype & phenotype
genotyping
Genotyping Techniques - methods
Hymenoptera - genetics
methodology
next-generation sequencing
Polymerase Chain Reaction - methods
RAD-seq
RAD-tag
Saccharomyces cerevisiae - genetics
Sequence Analysis, DNA - methods
title Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T22%3A57%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Degenerate%20adaptor%20sequences%20for%20detecting%20PCR%20duplicates%20in%20reduced%20representation%20sequencing%20data%20improve%20genotype%20calling%20accuracy&rft.jtitle=Molecular%20ecology%20resources&rft.au=Tin,%20M.%20M.%20Y.&rft.date=2015-03&rft.volume=15&rft.issue=2&rft.spage=329&rft.epage=336&rft.pages=329-336&rft.issn=1755-098X&rft.eissn=1755-0998&rft_id=info:doi/10.1111/1755-0998.12314&rft_dat=%3Cproquest_cross%3E1655522355%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1654798268&rft_id=info:pmid/25132578&rfr_iscdi=true