PhyloMCL: Accurate clustering of hierarchical orthogroups guided by phylogenetic relationship and inference of polyploidy events

Identification of homology relationships is essential for inferring gene functions, detecting phylogeny of gene families, discovering evolutionary history of life, and usually, is the first step of many genetic and genomic studies. However, the presence of gene duplicates, variation on evolutionary...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Methods in ecology and evolution 2020-08, Vol.11 (8), p.943-954
Hauptverfasser: Zhou, Shengyu, Chen, Yamao, Guo, Chunce, Qi, Ji, Johnston, Susan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 954
container_issue 8
container_start_page 943
container_title Methods in ecology and evolution
container_volume 11
creator Zhou, Shengyu
Chen, Yamao
Guo, Chunce
Qi, Ji
Johnston, Susan
description Identification of homology relationships is essential for inferring gene functions, detecting phylogeny of gene families, discovering evolutionary history of life, and usually, is the first step of many genetic and genomic studies. However, the presence of gene duplicates, variation on evolutionary rates of homologs, fusion and fission of genes, can lead to misidentification of evolutionary relationships among homologs. Here we provide a Markov clustering based method called PhyloMCL to accurately detect hierarchical orthogroups (HOGs) including orthologs and paralogs, which derived from duplications subsequent to speciation of involved species, by considering both phylogenetic relationship of organisms and effects of polyploidy events. Its performance, evaluated by a list of benchmark gene families, when applying to the clustering of HOGs from 12 Metazoan genomes, reaches up to 87.8% and 83.2% on recall and precision rates respectively. Further application of PhyloMCL on classification of tens of thousands of paralogs, yielded by multiple polyploidy events during evolution of seed plants, successfully identifies the majority of in‐/out‐paralogs at different taxonomic levels. Benefiting from the strategy of Markov clustering and guidance of species tree, PhyloMCL can accurately classify millions of homologous genes with affordable time, meeting the challenge of phylogenomic studies upon rapid increasing of sequenced genomes.
doi_str_mv 10.1111/2041-210X.13401
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2429612933</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2429612933</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3571-7abaa8494fdecb2d9656dbc2dad420bb1558067f224bc5f21a50e8a3dcdfbe283</originalsourceid><addsrcrecordid>eNqFkEtLw0AUhYMoWGrXbgdcp52ZPJq4K6U-oEUXCu6GedxJpsRMnEmU7PzpJkbEnWdzL5dzzoUvCC4JXpJBK4pjElKCX5YkijE5CWa_l9M_-3mw8P6IB0VZjmk8Cz4fy76yh-3-Gm2k7BxvAcmq8y04UxfIalQacNzJ0kheIeva0hbOdo1HRWcUKCR61IwdBdTQGokcVLw1tvalaRCvFTK1Bge1hLGtsVXfVNaoHsE71K2_CM40rzwsfuY8eL7ZPW3vwv3D7f12sw9llKxJuOaC8yzOY61ACqryNEmVkFRxFVMsBEmSDKdrTWksZKIp4QmGjEdKKi2AZtE8uJp6G2ffOvAtO9rO1cNLRmOap4TmUTS4VpNLOuu9A80aZ1656xnBbCTNRpZsZMm-SQ-JdEp8mAr6_-zssNtFU_ALy_WDrQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2429612933</pqid></control><display><type>article</type><title>PhyloMCL: Accurate clustering of hierarchical orthogroups guided by phylogenetic relationship and inference of polyploidy events</title><source>Wiley Online Library Journals Frontfile Complete</source><source>Alma/SFX Local Collection</source><creator>Zhou, Shengyu ; Chen, Yamao ; Guo, Chunce ; Qi, Ji ; Johnston, Susan</creator><creatorcontrib>Zhou, Shengyu ; Chen, Yamao ; Guo, Chunce ; Qi, Ji ; Johnston, Susan</creatorcontrib><description>Identification of homology relationships is essential for inferring gene functions, detecting phylogeny of gene families, discovering evolutionary history of life, and usually, is the first step of many genetic and genomic studies. However, the presence of gene duplicates, variation on evolutionary rates of homologs, fusion and fission of genes, can lead to misidentification of evolutionary relationships among homologs. Here we provide a Markov clustering based method called PhyloMCL to accurately detect hierarchical orthogroups (HOGs) including orthologs and paralogs, which derived from duplications subsequent to speciation of involved species, by considering both phylogenetic relationship of organisms and effects of polyploidy events. Its performance, evaluated by a list of benchmark gene families, when applying to the clustering of HOGs from 12 Metazoan genomes, reaches up to 87.8% and 83.2% on recall and precision rates respectively. Further application of PhyloMCL on classification of tens of thousands of paralogs, yielded by multiple polyploidy events during evolution of seed plants, successfully identifies the majority of in‐/out‐paralogs at different taxonomic levels. Benefiting from the strategy of Markov clustering and guidance of species tree, PhyloMCL can accurately classify millions of homologous genes with affordable time, meeting the challenge of phylogenomic studies upon rapid increasing of sequenced genomes.</description><identifier>ISSN: 2041-210X</identifier><identifier>EISSN: 2041-210X</identifier><identifier>DOI: 10.1111/2041-210X.13401</identifier><language>eng</language><publisher>London: John Wiley &amp; Sons, Inc</publisher><subject>Clustering ; Evolution ; Evolutionary genetics ; gene duplication ; Gene families ; Genes ; Genomes ; hierarchical orthogroup classification ; Homology ; Markov clustering ; paralogs ; Phylogenetics ; phylogenomics ; Phylogeny ; Polyploidy ; Reproduction (copying) ; Speciation ; Swine</subject><ispartof>Methods in ecology and evolution, 2020-08, Vol.11 (8), p.943-954</ispartof><rights>2020 British Ecological Society</rights><rights>Methods in Ecology and Evolution © 2020 British Ecological Society</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3571-7abaa8494fdecb2d9656dbc2dad420bb1558067f224bc5f21a50e8a3dcdfbe283</citedby><cites>FETCH-LOGICAL-c3571-7abaa8494fdecb2d9656dbc2dad420bb1558067f224bc5f21a50e8a3dcdfbe283</cites><orcidid>0000-0003-3376-1116 ; 0000-0001-8716-7716 ; 0000-0001-7135-0524 ; 0000-0002-9472-4936</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2F2041-210X.13401$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2F2041-210X.13401$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Zhou, Shengyu</creatorcontrib><creatorcontrib>Chen, Yamao</creatorcontrib><creatorcontrib>Guo, Chunce</creatorcontrib><creatorcontrib>Qi, Ji</creatorcontrib><creatorcontrib>Johnston, Susan</creatorcontrib><title>PhyloMCL: Accurate clustering of hierarchical orthogroups guided by phylogenetic relationship and inference of polyploidy events</title><title>Methods in ecology and evolution</title><description>Identification of homology relationships is essential for inferring gene functions, detecting phylogeny of gene families, discovering evolutionary history of life, and usually, is the first step of many genetic and genomic studies. However, the presence of gene duplicates, variation on evolutionary rates of homologs, fusion and fission of genes, can lead to misidentification of evolutionary relationships among homologs. Here we provide a Markov clustering based method called PhyloMCL to accurately detect hierarchical orthogroups (HOGs) including orthologs and paralogs, which derived from duplications subsequent to speciation of involved species, by considering both phylogenetic relationship of organisms and effects of polyploidy events. Its performance, evaluated by a list of benchmark gene families, when applying to the clustering of HOGs from 12 Metazoan genomes, reaches up to 87.8% and 83.2% on recall and precision rates respectively. Further application of PhyloMCL on classification of tens of thousands of paralogs, yielded by multiple polyploidy events during evolution of seed plants, successfully identifies the majority of in‐/out‐paralogs at different taxonomic levels. Benefiting from the strategy of Markov clustering and guidance of species tree, PhyloMCL can accurately classify millions of homologous genes with affordable time, meeting the challenge of phylogenomic studies upon rapid increasing of sequenced genomes.</description><subject>Clustering</subject><subject>Evolution</subject><subject>Evolutionary genetics</subject><subject>gene duplication</subject><subject>Gene families</subject><subject>Genes</subject><subject>Genomes</subject><subject>hierarchical orthogroup classification</subject><subject>Homology</subject><subject>Markov clustering</subject><subject>paralogs</subject><subject>Phylogenetics</subject><subject>phylogenomics</subject><subject>Phylogeny</subject><subject>Polyploidy</subject><subject>Reproduction (copying)</subject><subject>Speciation</subject><subject>Swine</subject><issn>2041-210X</issn><issn>2041-210X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNqFkEtLw0AUhYMoWGrXbgdcp52ZPJq4K6U-oEUXCu6GedxJpsRMnEmU7PzpJkbEnWdzL5dzzoUvCC4JXpJBK4pjElKCX5YkijE5CWa_l9M_-3mw8P6IB0VZjmk8Cz4fy76yh-3-Gm2k7BxvAcmq8y04UxfIalQacNzJ0kheIeva0hbOdo1HRWcUKCR61IwdBdTQGokcVLw1tvalaRCvFTK1Bge1hLGtsVXfVNaoHsE71K2_CM40rzwsfuY8eL7ZPW3vwv3D7f12sw9llKxJuOaC8yzOY61ACqryNEmVkFRxFVMsBEmSDKdrTWksZKIp4QmGjEdKKi2AZtE8uJp6G2ffOvAtO9rO1cNLRmOap4TmUTS4VpNLOuu9A80aZ1656xnBbCTNRpZsZMm-SQ-JdEp8mAr6_-zssNtFU_ALy_WDrQ</recordid><startdate>202008</startdate><enddate>202008</enddate><creator>Zhou, Shengyu</creator><creator>Chen, Yamao</creator><creator>Guo, Chunce</creator><creator>Qi, Ji</creator><creator>Johnston, Susan</creator><general>John Wiley &amp; Sons, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7QG</scope><scope>7SN</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><orcidid>https://orcid.org/0000-0003-3376-1116</orcidid><orcidid>https://orcid.org/0000-0001-8716-7716</orcidid><orcidid>https://orcid.org/0000-0001-7135-0524</orcidid><orcidid>https://orcid.org/0000-0002-9472-4936</orcidid></search><sort><creationdate>202008</creationdate><title>PhyloMCL: Accurate clustering of hierarchical orthogroups guided by phylogenetic relationship and inference of polyploidy events</title><author>Zhou, Shengyu ; Chen, Yamao ; Guo, Chunce ; Qi, Ji ; Johnston, Susan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3571-7abaa8494fdecb2d9656dbc2dad420bb1558067f224bc5f21a50e8a3dcdfbe283</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Clustering</topic><topic>Evolution</topic><topic>Evolutionary genetics</topic><topic>gene duplication</topic><topic>Gene families</topic><topic>Genes</topic><topic>Genomes</topic><topic>hierarchical orthogroup classification</topic><topic>Homology</topic><topic>Markov clustering</topic><topic>paralogs</topic><topic>Phylogenetics</topic><topic>phylogenomics</topic><topic>Phylogeny</topic><topic>Polyploidy</topic><topic>Reproduction (copying)</topic><topic>Speciation</topic><topic>Swine</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Shengyu</creatorcontrib><creatorcontrib>Chen, Yamao</creatorcontrib><creatorcontrib>Guo, Chunce</creatorcontrib><creatorcontrib>Qi, Ji</creatorcontrib><creatorcontrib>Johnston, Susan</creatorcontrib><collection>CrossRef</collection><collection>Animal Behavior Abstracts</collection><collection>Ecology Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><jtitle>Methods in ecology and evolution</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhou, Shengyu</au><au>Chen, Yamao</au><au>Guo, Chunce</au><au>Qi, Ji</au><au>Johnston, Susan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PhyloMCL: Accurate clustering of hierarchical orthogroups guided by phylogenetic relationship and inference of polyploidy events</atitle><jtitle>Methods in ecology and evolution</jtitle><date>2020-08</date><risdate>2020</risdate><volume>11</volume><issue>8</issue><spage>943</spage><epage>954</epage><pages>943-954</pages><issn>2041-210X</issn><eissn>2041-210X</eissn><abstract>Identification of homology relationships is essential for inferring gene functions, detecting phylogeny of gene families, discovering evolutionary history of life, and usually, is the first step of many genetic and genomic studies. However, the presence of gene duplicates, variation on evolutionary rates of homologs, fusion and fission of genes, can lead to misidentification of evolutionary relationships among homologs. Here we provide a Markov clustering based method called PhyloMCL to accurately detect hierarchical orthogroups (HOGs) including orthologs and paralogs, which derived from duplications subsequent to speciation of involved species, by considering both phylogenetic relationship of organisms and effects of polyploidy events. Its performance, evaluated by a list of benchmark gene families, when applying to the clustering of HOGs from 12 Metazoan genomes, reaches up to 87.8% and 83.2% on recall and precision rates respectively. Further application of PhyloMCL on classification of tens of thousands of paralogs, yielded by multiple polyploidy events during evolution of seed plants, successfully identifies the majority of in‐/out‐paralogs at different taxonomic levels. Benefiting from the strategy of Markov clustering and guidance of species tree, PhyloMCL can accurately classify millions of homologous genes with affordable time, meeting the challenge of phylogenomic studies upon rapid increasing of sequenced genomes.</abstract><cop>London</cop><pub>John Wiley &amp; Sons, Inc</pub><doi>10.1111/2041-210X.13401</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-3376-1116</orcidid><orcidid>https://orcid.org/0000-0001-8716-7716</orcidid><orcidid>https://orcid.org/0000-0001-7135-0524</orcidid><orcidid>https://orcid.org/0000-0002-9472-4936</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2041-210X
ispartof Methods in ecology and evolution, 2020-08, Vol.11 (8), p.943-954
issn 2041-210X
2041-210X
language eng
recordid cdi_proquest_journals_2429612933
source Wiley Online Library Journals Frontfile Complete; Alma/SFX Local Collection
subjects Clustering
Evolution
Evolutionary genetics
gene duplication
Gene families
Genes
Genomes
hierarchical orthogroup classification
Homology
Markov clustering
paralogs
Phylogenetics
phylogenomics
Phylogeny
Polyploidy
Reproduction (copying)
Speciation
Swine
title PhyloMCL: Accurate clustering of hierarchical orthogroups guided by phylogenetic relationship and inference of polyploidy events
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T22%3A17%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PhyloMCL:%20Accurate%20clustering%20of%20hierarchical%20orthogroups%20guided%20by%20phylogenetic%20relationship%20and%20inference%20of%20polyploidy%20events&rft.jtitle=Methods%20in%20ecology%20and%20evolution&rft.au=Zhou,%20Shengyu&rft.date=2020-08&rft.volume=11&rft.issue=8&rft.spage=943&rft.epage=954&rft.pages=943-954&rft.issn=2041-210X&rft.eissn=2041-210X&rft_id=info:doi/10.1111/2041-210X.13401&rft_dat=%3Cproquest_cross%3E2429612933%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2429612933&rft_id=info:pmid/&rfr_iscdi=true