Highly parallelized inference of large genome-based phylogenies

SUMMARYGenome Blast Distance Phylogeny (GBDP) infers distances and phylogenetic relationships between organisms from completely or partially sequenced genomes. It is well suited for parallelization as pairwise distances are calculated independently. As exemplar data for a high‐performance cluster im...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Concurrency and computation 2014-07, Vol.26 (10), p.1715-1729
Hauptverfasser: Meier-Kolthoff, Jan P., Auch, Alexander F., Klenk, Hans-Peter, Göker, Markus
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:SUMMARYGenome Blast Distance Phylogeny (GBDP) infers distances and phylogenetic relationships between organisms from completely or partially sequenced genomes. It is well suited for parallelization as pairwise distances are calculated independently. As exemplar data for a high‐performance cluster implementation that executes many pairwise genome comparisons in parallel, we here used sequences from the Genomic Encyclopedia of Bacteria and Archaea project. Phylogenies were inferred from genome‐scale nucleotide and amino acid data with all variants of GBDP, including novel adaptations to amino acid sequences and approaches yielding trees with branch support. The dependency of phylogenetic accuracy, average branch support as well as performance indicators such as running time and disk space consumption on details of genome comparison, distance calculation, and phylogenetic inference was examined in detail. If combined with conservative measures for branch support, GBDP appears to infer reasonable phylogenetic relationships of microorganisms with a comparatively low computational cost. Due to the linear speed‐up of the cluster, benchmarks reveal an overall computation time of less than 24 h required for the 7750 pairwise genome/proteome comparisons of the Genomic Encyclopedia of Bacteria and Archaea data set that is opposed to an estimated running time of about 30 days for the non‐parallelized version. Copyright © 2013 John Wiley & Sons, Ltd.
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.3112