Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy
RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically...
Gespeichert in:
Veröffentlicht in: | Molecular ecology resources 2015-03, Vol.15 (2), p.329-336 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 336 |
---|---|
container_issue | 2 |
container_start_page | 329 |
container_title | Molecular ecology resources |
container_volume | 15 |
creator | Tin, M. M. Y. Rheindt, F. E. Cros, E. Mikheyev, A. S. |
description | RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material. |
doi_str_mv | 10.1111/1755-0998.12314 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1655522355</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1655522355</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5454-e862057ddd5a493e2ecb8d79c1ea75bbabf86968911f1105c86339485839e1123</originalsourceid><addsrcrecordid>eNqFkU9v1DAQxSMEon_gzA1Z4sIlbRxnYvuIltIiLS2tWtGb5diTyiWbpLZD2Q_A98Zhu3vgUl_GY__e88gvy97R4oimdUw5QF5IKY5oyWj1Itvfnbzc7cXtXnYQwn1R1IXk1etsrwTKSuBiP_vzGe-wR68jEm31GAdPAj5M2BsMpE2dxYgmuv6OfF9cETuNnTOJDsT1xKOdDNpUR48B-6ijG_qtwayxOmriVqMffiFJLw1xPSIxuuvmW23M5LVZv8letboL-PapHmY3X06uF2f58uL06-LTMjdQQZWjqMsCuLUWdCUZlmgaYbk0FDWHptFNK2pZC0lpS2kBRtSMyUqAYBJp-qHD7OPGN82TRgxRrVww2HW6x2EKitYAUJYMIKEf_kPvh8n3abqZqrgUZS0SdbyhjB9C8Niq0buV9mtFCzUnpOYM1JyH-pdQUrx_8p2aFdodv40kAbABHl2H6-f81LeT861xvtG5EPH3Tqf9T1VzxkH9OD9VIC6v-dnyVl2yvxjCq4s</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1654798268</pqid></control><display><type>article</type><title>Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy</title><source>MEDLINE</source><source>Wiley Online Library Journals Frontfile Complete</source><creator>Tin, M. M. Y. ; Rheindt, F. E. ; Cros, E. ; Mikheyev, A. S.</creator><creatorcontrib>Tin, M. M. Y. ; Rheindt, F. E. ; Cros, E. ; Mikheyev, A. S.</creatorcontrib><description>RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material.</description><identifier>ISSN: 1755-098X</identifier><identifier>EISSN: 1755-0998</identifier><identifier>DOI: 10.1111/1755-0998.12314</identifier><identifier>PMID: 25132578</identifier><language>eng</language><publisher>England: Blackwell Publishing Ltd</publisher><subject>Accuracy ; Animals ; DNA Primers - genetics ; Genotype ; Genotype & phenotype ; genotyping ; Genotyping Techniques - methods ; Hymenoptera - genetics ; methodology ; next-generation sequencing ; Polymerase Chain Reaction - methods ; RAD-seq ; RAD-tag ; Saccharomyces cerevisiae - genetics ; Sequence Analysis, DNA - methods</subject><ispartof>Molecular ecology resources, 2015-03, Vol.15 (2), p.329-336</ispartof><rights>2014 John Wiley & Sons Ltd</rights><rights>2014 John Wiley & Sons Ltd.</rights><rights>Copyright © 2015 John Wiley & Sons Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5454-e862057ddd5a493e2ecb8d79c1ea75bbabf86968911f1105c86339485839e1123</citedby><cites>FETCH-LOGICAL-c5454-e862057ddd5a493e2ecb8d79c1ea75bbabf86968911f1105c86339485839e1123</cites><orcidid>0000-0003-4369-1019</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2F1755-0998.12314$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2F1755-0998.12314$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25132578$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Tin, M. M. Y.</creatorcontrib><creatorcontrib>Rheindt, F. E.</creatorcontrib><creatorcontrib>Cros, E.</creatorcontrib><creatorcontrib>Mikheyev, A. S.</creatorcontrib><title>Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy</title><title>Molecular ecology resources</title><addtitle>Mol Ecol Resour</addtitle><description>RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material.</description><subject>Accuracy</subject><subject>Animals</subject><subject>DNA Primers - genetics</subject><subject>Genotype</subject><subject>Genotype & phenotype</subject><subject>genotyping</subject><subject>Genotyping Techniques - methods</subject><subject>Hymenoptera - genetics</subject><subject>methodology</subject><subject>next-generation sequencing</subject><subject>Polymerase Chain Reaction - methods</subject><subject>RAD-seq</subject><subject>RAD-tag</subject><subject>Saccharomyces cerevisiae - genetics</subject><subject>Sequence Analysis, DNA - methods</subject><issn>1755-098X</issn><issn>1755-0998</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkU9v1DAQxSMEon_gzA1Z4sIlbRxnYvuIltIiLS2tWtGb5diTyiWbpLZD2Q_A98Zhu3vgUl_GY__e88gvy97R4oimdUw5QF5IKY5oyWj1Itvfnbzc7cXtXnYQwn1R1IXk1etsrwTKSuBiP_vzGe-wR68jEm31GAdPAj5M2BsMpE2dxYgmuv6OfF9cETuNnTOJDsT1xKOdDNpUR48B-6ijG_qtwayxOmriVqMffiFJLw1xPSIxuuvmW23M5LVZv8letboL-PapHmY3X06uF2f58uL06-LTMjdQQZWjqMsCuLUWdCUZlmgaYbk0FDWHptFNK2pZC0lpS2kBRtSMyUqAYBJp-qHD7OPGN82TRgxRrVww2HW6x2EKitYAUJYMIKEf_kPvh8n3abqZqrgUZS0SdbyhjB9C8Niq0buV9mtFCzUnpOYM1JyH-pdQUrx_8p2aFdodv40kAbABHl2H6-f81LeT861xvtG5EPH3Tqf9T1VzxkH9OD9VIC6v-dnyVl2yvxjCq4s</recordid><startdate>201503</startdate><enddate>201503</enddate><creator>Tin, M. M. Y.</creator><creator>Rheindt, F. E.</creator><creator>Cros, E.</creator><creator>Mikheyev, A. S.</creator><general>Blackwell Publishing Ltd</general><general>Wiley Subscription Services, Inc</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SN</scope><scope>7SS</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-4369-1019</orcidid></search><sort><creationdate>201503</creationdate><title>Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy</title><author>Tin, M. M. Y. ; Rheindt, F. E. ; Cros, E. ; Mikheyev, A. S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5454-e862057ddd5a493e2ecb8d79c1ea75bbabf86968911f1105c86339485839e1123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Accuracy</topic><topic>Animals</topic><topic>DNA Primers - genetics</topic><topic>Genotype</topic><topic>Genotype & phenotype</topic><topic>genotyping</topic><topic>Genotyping Techniques - methods</topic><topic>Hymenoptera - genetics</topic><topic>methodology</topic><topic>next-generation sequencing</topic><topic>Polymerase Chain Reaction - methods</topic><topic>RAD-seq</topic><topic>RAD-tag</topic><topic>Saccharomyces cerevisiae - genetics</topic><topic>Sequence Analysis, DNA - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tin, M. M. Y.</creatorcontrib><creatorcontrib>Rheindt, F. E.</creatorcontrib><creatorcontrib>Cros, E.</creatorcontrib><creatorcontrib>Mikheyev, A. S.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Molecular ecology resources</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tin, M. M. Y.</au><au>Rheindt, F. E.</au><au>Cros, E.</au><au>Mikheyev, A. S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy</atitle><jtitle>Molecular ecology resources</jtitle><addtitle>Mol Ecol Resour</addtitle><date>2015-03</date><risdate>2015</risdate><volume>15</volume><issue>2</issue><spage>329</spage><epage>336</epage><pages>329-336</pages><issn>1755-098X</issn><eissn>1755-0998</eissn><abstract>RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material.</abstract><cop>England</cop><pub>Blackwell Publishing Ltd</pub><pmid>25132578</pmid><doi>10.1111/1755-0998.12314</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0003-4369-1019</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1755-098X |
ispartof | Molecular ecology resources, 2015-03, Vol.15 (2), p.329-336 |
issn | 1755-098X 1755-0998 |
language | eng |
recordid | cdi_proquest_miscellaneous_1655522355 |
source | MEDLINE; Wiley Online Library Journals Frontfile Complete |
subjects | Accuracy Animals DNA Primers - genetics Genotype Genotype & phenotype genotyping Genotyping Techniques - methods Hymenoptera - genetics methodology next-generation sequencing Polymerase Chain Reaction - methods RAD-seq RAD-tag Saccharomyces cerevisiae - genetics Sequence Analysis, DNA - methods |
title | Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T22%3A57%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Degenerate%20adaptor%20sequences%20for%20detecting%20PCR%20duplicates%20in%20reduced%20representation%20sequencing%20data%20improve%20genotype%20calling%20accuracy&rft.jtitle=Molecular%20ecology%20resources&rft.au=Tin,%20M.%20M.%20Y.&rft.date=2015-03&rft.volume=15&rft.issue=2&rft.spage=329&rft.epage=336&rft.pages=329-336&rft.issn=1755-098X&rft.eissn=1755-0998&rft_id=info:doi/10.1111/1755-0998.12314&rft_dat=%3Cproquest_cross%3E1655522355%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1654798268&rft_id=info:pmid/25132578&rfr_iscdi=true |