Debar: A sequence‐by‐sequence denoiser for COI‐5P DNA barcode data

DNA barcoding and metabarcoding are now widely used to advance species discovery and biodiversity assessments. High‐throughput sequencing (HTS) has expanded the volume and scope of these analyses, but elevated error rates introduce noise into sequence records that can inflate estimates of biodiversi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular ecology resources 2021-11, Vol.21 (8), p.2832-2846
Hauptverfasser: Nugent, Cameron M., Elliott, Tyler A., Ratnasingham, Sujeevan, Hebert, Paul D.N., Adamowicz, Sarah J.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:DNA barcoding and metabarcoding are now widely used to advance species discovery and biodiversity assessments. High‐throughput sequencing (HTS) has expanded the volume and scope of these analyses, but elevated error rates introduce noise into sequence records that can inflate estimates of biodiversity. Denoising —the separation of biological signal from instrument (technical) noise—of barcode and metabarcode data currently employs abundance‐based methods which do not capitalize on the highly conserved structure of the cytochrome c oxidase subunit I (COI) region employed as the animal barcode. This manuscript introduces debar, an R package that utilizes a profile hidden Markov model to denoise indel errors in COI sequences introduced by instrument error. In silico studies demonstrated that debar recognized 95% of artificially introduced indels in COI sequences. When applied to real‐world data, debar reduced indel errors in circular consensus sequences obtained with the Sequel platform by 75%, and those generated on the Ion Torrent S5 by 94%. The false correction rate was less than 0.1%, indicating that debar is receptive to the majority of true COI variation in the animal kingdom. In conclusion, the debar package improves DNA barcode and metabarcode workflows by aiding the generation of more accurate sequences aiding the characterization of species diversity.
ISSN:1755-098X
1755-0998
DOI:10.1111/1755-0998.13384