PhyloMCL: Accurate clustering of hierarchical orthogroups guided by phylogenetic relationship and inference of polyploidy events
Identification of homology relationships is essential for inferring gene functions, detecting phylogeny of gene families, discovering evolutionary history of life, and usually, is the first step of many genetic and genomic studies. However, the presence of gene duplicates, variation on evolutionary...
Gespeichert in:
Veröffentlicht in: | Methods in ecology and evolution 2020-08, Vol.11 (8), p.943-954 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Identification of homology relationships is essential for inferring gene functions, detecting phylogeny of gene families, discovering evolutionary history of life, and usually, is the first step of many genetic and genomic studies. However, the presence of gene duplicates, variation on evolutionary rates of homologs, fusion and fission of genes, can lead to misidentification of evolutionary relationships among homologs.
Here we provide a Markov clustering based method called PhyloMCL to accurately detect hierarchical orthogroups (HOGs) including orthologs and paralogs, which derived from duplications subsequent to speciation of involved species, by considering both phylogenetic relationship of organisms and effects of polyploidy events.
Its performance, evaluated by a list of benchmark gene families, when applying to the clustering of HOGs from 12 Metazoan genomes, reaches up to 87.8% and 83.2% on recall and precision rates respectively. Further application of PhyloMCL on classification of tens of thousands of paralogs, yielded by multiple polyploidy events during evolution of seed plants, successfully identifies the majority of in‐/out‐paralogs at different taxonomic levels.
Benefiting from the strategy of Markov clustering and guidance of species tree, PhyloMCL can accurately classify millions of homologous genes with affordable time, meeting the challenge of phylogenomic studies upon rapid increasing of sequenced genomes. |
---|---|
ISSN: | 2041-210X 2041-210X |
DOI: | 10.1111/2041-210X.13401 |