MitoGeneExtractor: Efficient extraction of mitochondrial genes from next‐generation sequencing libraries

Mitochondrial DNA (mtDNA) sequences are often found as byproducts in next‐generation sequencing (NGS) datasets that were originally created to capture genomic or transcriptomic information of an organism. These mtDNA sequences are often discarded, wasting this valuable sequencing information. We dev...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Methods in ecology and evolution 2023-04, Vol.14 (4), p.1017-1024
Hauptverfasser: Brasseur, Marie V., Astrin, Jonas J., Geiger, Matthias F., Mayer, Christoph
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Mitochondrial DNA (mtDNA) sequences are often found as byproducts in next‐generation sequencing (NGS) datasets that were originally created to capture genomic or transcriptomic information of an organism. These mtDNA sequences are often discarded, wasting this valuable sequencing information. We developed MitoGeneExtractor, an innovative tool which allows to extract mitochondrial protein coding genes (PCGs) of interest from NGS libraries through multiple sequence alignments of sequencing reads to amino acid references. General references, for example on order level are sufficient for mining mitochondrial PCGs. In a case study, we applied MitoGeneExtractor to recently published genomic datasets of 1993 birds and were able to extract complete or nearly complete sequences for all 13 mitochondrial PCGs for a large proportion of libraries. Compared to an existing assembly guided sequence reconstruction algorithm, MitoGeneExtractor was faster and substantially more sensitive. We compared COI sequences mined with MitoGeneExtractor to COI databases. Mined sequences show a high sequence similarity and correct taxonomic assignment between the recovered sequence and the assigned morphospecies in most samples. In some cases of incongruent taxonomic assignments, we found evidence for contamination in NGS libraries. MitoGeneExtractor allows a fast extraction of mitochondrial PCGs from a wide range of NGS datasets. We recommend to routinely harvest and curate mitochondrial sequence information from genomic resources. MitoGeneExtractor output can be used to identify contaminated NGS libraries and to validate the species identity of the sequenced animal based on the extracted COI sequences.
ISSN:2041-210X
2041-210X
DOI:10.1111/2041-210X.14075