Accurate annotation of protein-coding genes in mitochondrial genomes

[Display omitted] •An automated pipeline for fast de-novo annotation of mitochondrial protein-coding genes is presented.•The method generates taxon-specific enhanced multiple sequence alignments (MSA) and corresponding HMMs.•Automatic frameshift correction method.•Detailed analysis of the frameshift...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular phylogenetics and evolution 2017-01, Vol.106, p.209-216
Hauptverfasser: Al Arab, Marwa, Höner zu Siederdissen, Christian, Tout, Kifah, Sahyoun, Abdullah H., Stadler, Peter F., Bernt, Matthias
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •An automated pipeline for fast de-novo annotation of mitochondrial protein-coding genes is presented.•The method generates taxon-specific enhanced multiple sequence alignments (MSA) and corresponding HMMs.•Automatic frameshift correction method.•Detailed analysis of the frameshifts in nad3 of the Testudines-Archosauria. Mitochondrial genome sequences are available in large number and new sequences become published nowadays with increasing pace. Fast, automatic, consistent, and high quality annotations are a prerequisite for downstream analyses. Therefore, we present an automated pipeline for fast de novo annotation of mitochondrial protein-coding genes. The annotation is based on enhanced phylogeny-aware hidden Markov models (HMMs). The pipeline builds taxon-specific enhanced multiple sequence alignments (MSA) of already annotated sequences and corresponding HMMs using an approximation of the phylogeny. The MSAs are enhanced by fixing unannotated frameshifts, purging of wrong sequences, and removal of non-conserved columns from both ends. A comparison with reference annotations highlights the high quality of the results. The frameshift correction method predicts a large number of frameshifts, many of which are unknown. A detailed analysis of the frameshifts in nad3 of the Archosauria-Testudines group has been conducted.
ISSN:1055-7903
1095-9513
DOI:10.1016/j.ympev.2016.09.024