Assessing the Performance of Single-Copy Genes for Recovering Robust Phylogenies

Phylogenies involving nonmodel species are based on a few genes, mostly chosen following historical or practical criteria. Because gene trees are sometimes incongruent with species trees, the resulting phylogenies may not accurately reflect the evolutionary relationships among species. The increase...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Systematic biology 2008-08, Vol.57 (4), p.613-627
Hauptverfasser: Aguileta, G., Marthey, S., Chiapello, H., Lebrun, M.-H., Rodolphe, F., Fournier, E., Gendrault-Jacquemard, A., Giraud, T.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 627
container_issue 4
container_start_page 613
container_title Systematic biology
container_volume 57
creator Aguileta, G.
Marthey, S.
Chiapello, H.
Lebrun, M.-H.
Rodolphe, F.
Fournier, E.
Gendrault-Jacquemard, A.
Giraud, T.
description Phylogenies involving nonmodel species are based on a few genes, mostly chosen following historical or practical criteria. Because gene trees are sometimes incongruent with species trees, the resulting phylogenies may not accurately reflect the evolutionary relationships among species. The increase in availability of genome sequences now provides large numbers of genes that could be used for building phylogenies. However, for practical reasons only a few genes can be sequenced for a wide range of species. Here we asked whether we can identify a few genes, among the single-copy genes common to most fungal genomes, that are sufficient for recovering accurate and well-supported phylogenies. Fungi represent a model group for phylogenomics because many complete fungal genomes are available. An automated procedure was developed to extract single-copy orthologous genes from complete fungal genomes using a Markov Clustering Algorithm (Tribe-MCL). Using 21 complete, publicly available fungal genomes with reliable protein predictions, 246 single-copy orthologous gene clusters were identified. We inferred the maximum likelihood trees using the individual orthologous sequences and constructed a reference tree from concatenated protein alignments. The topologies of the individual gene trees were compared to that of the reference tree using three different methods. The performance of individual genes in recovering the reference tree was highly variable. Gene size and the number of variable sites were highly correlated and significantly affected the performance of the genes, but the average substitution rate did not. Two genes recovered exactly the same topology as the reference tree, and when concatenated provided high bootstrap values. The genes typically used for fungal phylogenies did not perform well, which suggests that current fungal phylogenies based on these genes may not accurately reflect the evolutionary relationships among species. Analyses on subsets of species showed that the phylogenetic performance did not seem to depend strongly on the sample. We expect that the best-performing genes identified here will be very useful for phylogenetic studies of fungi, at least at a large taxonomic scale. Furthermore, we compare the method developed here for finding genes for building robust phylogenies with previous ones and we advocate that our method could be applied to other groups of organisms when more complete genomes are available.
doi_str_mv 10.1080/10635150802306527
format Article
fullrecord <record><control><sourceid>jstor_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_02333207v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>27756379</jstor_id><oup_id>10.1080/10635150802306527</oup_id><sourcerecordid>27756379</sourcerecordid><originalsourceid>FETCH-LOGICAL-c567t-821bb241c33dcd5c9fbc95b74b80822301acbd4b754377c938d207cc5a86fc9c3</originalsourceid><addsrcrecordid>eNqNkU-P0zAQxS0EYpeFD8ABFHFBSATsTPwnx1JBCyqiLIu04mLFzmSbbhoXO1nRb4-jVEWCCyeP5v3eG42HkKeMvmFU0beMCuCMxzIDKngm75FzRqVIFYjr-2MtII2APCOPQthSypjg7CE5Y0rSghfFOVnPQsAQmu4m6TeYrNHXzu_KzmLi6uRb7LeYzt3-kCyww5BENblE6-7Qj55LZ4bQJ-vNoXU32DUYHpMHddkGfHJ8L8j3D--v5st09WXxcT5bpZYL2acqY8ZkObMAla24LWpjC25kbhRVWVyHldZUuZE8ByltAarKqLSWl0rUtrBwQV5NuZuy1Xvf7Ep_0K5s9HK20mMv_glA9NyxyL6c2L13PwcMvd41wWLblh26IWgJIITiMJIv_iK3bvBdXESzIpcSlMgixCbIeheCx_o0n1E9Hkb_c5joeX4MHswOqz-O4yUi8HoC3LD_r7xnE74NvfMnQyYlFyDHuHTSm9Djr5Ne-lstJEiul9c_9PLT56vFV_VOL-A3Vyurng</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>194773862</pqid></control><display><type>article</type><title>Assessing the Performance of Single-Copy Genes for Recovering Robust Phylogenies</title><source>MEDLINE</source><source>Jstor Complete Legacy</source><source>Oxford University Press Journals All Titles (1996-Current)</source><creator>Aguileta, G. ; Marthey, S. ; Chiapello, H. ; Lebrun, M.-H. ; Rodolphe, F. ; Fournier, E. ; Gendrault-Jacquemard, A. ; Giraud, T.</creator><contributor>Ané, Cécile ; Ané, Cécile</contributor><creatorcontrib>Aguileta, G. ; Marthey, S. ; Chiapello, H. ; Lebrun, M.-H. ; Rodolphe, F. ; Fournier, E. ; Gendrault-Jacquemard, A. ; Giraud, T. ; Ané, Cécile ; Ané, Cécile</creatorcontrib><description>Phylogenies involving nonmodel species are based on a few genes, mostly chosen following historical or practical criteria. Because gene trees are sometimes incongruent with species trees, the resulting phylogenies may not accurately reflect the evolutionary relationships among species. The increase in availability of genome sequences now provides large numbers of genes that could be used for building phylogenies. However, for practical reasons only a few genes can be sequenced for a wide range of species. Here we asked whether we can identify a few genes, among the single-copy genes common to most fungal genomes, that are sufficient for recovering accurate and well-supported phylogenies. Fungi represent a model group for phylogenomics because many complete fungal genomes are available. An automated procedure was developed to extract single-copy orthologous genes from complete fungal genomes using a Markov Clustering Algorithm (Tribe-MCL). Using 21 complete, publicly available fungal genomes with reliable protein predictions, 246 single-copy orthologous gene clusters were identified. We inferred the maximum likelihood trees using the individual orthologous sequences and constructed a reference tree from concatenated protein alignments. The topologies of the individual gene trees were compared to that of the reference tree using three different methods. The performance of individual genes in recovering the reference tree was highly variable. Gene size and the number of variable sites were highly correlated and significantly affected the performance of the genes, but the average substitution rate did not. Two genes recovered exactly the same topology as the reference tree, and when concatenated provided high bootstrap values. The genes typically used for fungal phylogenies did not perform well, which suggests that current fungal phylogenies based on these genes may not accurately reflect the evolutionary relationships among species. Analyses on subsets of species showed that the phylogenetic performance did not seem to depend strongly on the sample. We expect that the best-performing genes identified here will be very useful for phylogenetic studies of fungi, at least at a large taxonomic scale. Furthermore, we compare the method developed here for finding genes for building robust phylogenies with previous ones and we advocate that our method could be applied to other groups of organisms when more complete genomes are available.</description><identifier>ISSN: 1063-5157</identifier><identifier>EISSN: 1076-836X</identifier><identifier>DOI: 10.1080/10635150802306527</identifier><identifier>PMID: 18709599</identifier><language>eng</language><publisher>England: Taylor &amp; Francis</publisher><subject>Algorithms ; Ascomycota ; Basidiomycota ; Biodiversity ; Biological taxonomies ; Classification - methods ; Datasets ; Evolution ; Fungal genomes ; Fungi ; Fungi - classification ; Fungi - genetics ; FUNYBASE ; Genes ; Genes, Fungal - genetics ; Genomes ; Genomics ; incongruence ; Life Sciences ; Likelihood Functions ; Markov analysis ; Multigene Family ; multigene phylogenies ; phylogenetic informativeness ; Phylogenetics ; Phylogeny ; Populations and Evolution ; Taxonomy ; topological score ; Topology ; tree of life ; Web site</subject><ispartof>Systematic biology, 2008-08, Vol.57 (4), p.613-627</ispartof><rights>Copyright 2008 Society of Systematic Biologists</rights><rights>2008 Society of Systematic Biologists 2008</rights><rights>Copyright Taylor &amp; Francis Ltd. Aug 2008</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c567t-821bb241c33dcd5c9fbc95b74b80822301acbd4b754377c938d207cc5a86fc9c3</citedby><cites>FETCH-LOGICAL-c567t-821bb241c33dcd5c9fbc95b74b80822301acbd4b754377c938d207cc5a86fc9c3</cites><orcidid>0000-0002-2685-6478 ; 0000-0003-1562-1902 ; 0000-0001-6959-1855</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/27756379$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/27756379$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,776,780,799,881,27903,27904,57996,58229</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/18709599$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://hal.science/hal-02333207$$DView record in HAL$$Hfree_for_read</backlink></links><search><contributor>Ané, Cécile</contributor><contributor>Ané, Cécile</contributor><creatorcontrib>Aguileta, G.</creatorcontrib><creatorcontrib>Marthey, S.</creatorcontrib><creatorcontrib>Chiapello, H.</creatorcontrib><creatorcontrib>Lebrun, M.-H.</creatorcontrib><creatorcontrib>Rodolphe, F.</creatorcontrib><creatorcontrib>Fournier, E.</creatorcontrib><creatorcontrib>Gendrault-Jacquemard, A.</creatorcontrib><creatorcontrib>Giraud, T.</creatorcontrib><title>Assessing the Performance of Single-Copy Genes for Recovering Robust Phylogenies</title><title>Systematic biology</title><addtitle>Syst Biol</addtitle><description>Phylogenies involving nonmodel species are based on a few genes, mostly chosen following historical or practical criteria. Because gene trees are sometimes incongruent with species trees, the resulting phylogenies may not accurately reflect the evolutionary relationships among species. The increase in availability of genome sequences now provides large numbers of genes that could be used for building phylogenies. However, for practical reasons only a few genes can be sequenced for a wide range of species. Here we asked whether we can identify a few genes, among the single-copy genes common to most fungal genomes, that are sufficient for recovering accurate and well-supported phylogenies. Fungi represent a model group for phylogenomics because many complete fungal genomes are available. An automated procedure was developed to extract single-copy orthologous genes from complete fungal genomes using a Markov Clustering Algorithm (Tribe-MCL). Using 21 complete, publicly available fungal genomes with reliable protein predictions, 246 single-copy orthologous gene clusters were identified. We inferred the maximum likelihood trees using the individual orthologous sequences and constructed a reference tree from concatenated protein alignments. The topologies of the individual gene trees were compared to that of the reference tree using three different methods. The performance of individual genes in recovering the reference tree was highly variable. Gene size and the number of variable sites were highly correlated and significantly affected the performance of the genes, but the average substitution rate did not. Two genes recovered exactly the same topology as the reference tree, and when concatenated provided high bootstrap values. The genes typically used for fungal phylogenies did not perform well, which suggests that current fungal phylogenies based on these genes may not accurately reflect the evolutionary relationships among species. Analyses on subsets of species showed that the phylogenetic performance did not seem to depend strongly on the sample. We expect that the best-performing genes identified here will be very useful for phylogenetic studies of fungi, at least at a large taxonomic scale. Furthermore, we compare the method developed here for finding genes for building robust phylogenies with previous ones and we advocate that our method could be applied to other groups of organisms when more complete genomes are available.</description><subject>Algorithms</subject><subject>Ascomycota</subject><subject>Basidiomycota</subject><subject>Biodiversity</subject><subject>Biological taxonomies</subject><subject>Classification - methods</subject><subject>Datasets</subject><subject>Evolution</subject><subject>Fungal genomes</subject><subject>Fungi</subject><subject>Fungi - classification</subject><subject>Fungi - genetics</subject><subject>FUNYBASE</subject><subject>Genes</subject><subject>Genes, Fungal - genetics</subject><subject>Genomes</subject><subject>Genomics</subject><subject>incongruence</subject><subject>Life Sciences</subject><subject>Likelihood Functions</subject><subject>Markov analysis</subject><subject>Multigene Family</subject><subject>multigene phylogenies</subject><subject>phylogenetic informativeness</subject><subject>Phylogenetics</subject><subject>Phylogeny</subject><subject>Populations and Evolution</subject><subject>Taxonomy</subject><subject>topological score</subject><subject>Topology</subject><subject>tree of life</subject><subject>Web site</subject><issn>1063-5157</issn><issn>1076-836X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkU-P0zAQxS0EYpeFD8ABFHFBSATsTPwnx1JBCyqiLIu04mLFzmSbbhoXO1nRb4-jVEWCCyeP5v3eG42HkKeMvmFU0beMCuCMxzIDKngm75FzRqVIFYjr-2MtII2APCOPQthSypjg7CE5Y0rSghfFOVnPQsAQmu4m6TeYrNHXzu_KzmLi6uRb7LeYzt3-kCyww5BENblE6-7Qj55LZ4bQJ-vNoXU32DUYHpMHddkGfHJ8L8j3D--v5st09WXxcT5bpZYL2acqY8ZkObMAla24LWpjC25kbhRVWVyHldZUuZE8ByltAarKqLSWl0rUtrBwQV5NuZuy1Xvf7Ep_0K5s9HK20mMv_glA9NyxyL6c2L13PwcMvd41wWLblh26IWgJIITiMJIv_iK3bvBdXESzIpcSlMgixCbIeheCx_o0n1E9Hkb_c5joeX4MHswOqz-O4yUi8HoC3LD_r7xnE74NvfMnQyYlFyDHuHTSm9Djr5Ne-lstJEiul9c_9PLT56vFV_VOL-A3Vyurng</recordid><startdate>200808</startdate><enddate>200808</enddate><creator>Aguileta, G.</creator><creator>Marthey, S.</creator><creator>Chiapello, H.</creator><creator>Lebrun, M.-H.</creator><creator>Rodolphe, F.</creator><creator>Fournier, E.</creator><creator>Gendrault-Jacquemard, A.</creator><creator>Giraud, T.</creator><general>Taylor &amp; Francis</general><general>Taylor &amp; Francis Group</general><general>Oxford University Press</general><general>Oxford University Press (OUP)</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>K9.</scope><scope>7X8</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-2685-6478</orcidid><orcidid>https://orcid.org/0000-0003-1562-1902</orcidid><orcidid>https://orcid.org/0000-0001-6959-1855</orcidid></search><sort><creationdate>200808</creationdate><title>Assessing the Performance of Single-Copy Genes for Recovering Robust Phylogenies</title><author>Aguileta, G. ; Marthey, S. ; Chiapello, H. ; Lebrun, M.-H. ; Rodolphe, F. ; Fournier, E. ; Gendrault-Jacquemard, A. ; Giraud, T.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c567t-821bb241c33dcd5c9fbc95b74b80822301acbd4b754377c938d207cc5a86fc9c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Ascomycota</topic><topic>Basidiomycota</topic><topic>Biodiversity</topic><topic>Biological taxonomies</topic><topic>Classification - methods</topic><topic>Datasets</topic><topic>Evolution</topic><topic>Fungal genomes</topic><topic>Fungi</topic><topic>Fungi - classification</topic><topic>Fungi - genetics</topic><topic>FUNYBASE</topic><topic>Genes</topic><topic>Genes, Fungal - genetics</topic><topic>Genomes</topic><topic>Genomics</topic><topic>incongruence</topic><topic>Life Sciences</topic><topic>Likelihood Functions</topic><topic>Markov analysis</topic><topic>Multigene Family</topic><topic>multigene phylogenies</topic><topic>phylogenetic informativeness</topic><topic>Phylogenetics</topic><topic>Phylogeny</topic><topic>Populations and Evolution</topic><topic>Taxonomy</topic><topic>topological score</topic><topic>Topology</topic><topic>tree of life</topic><topic>Web site</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Aguileta, G.</creatorcontrib><creatorcontrib>Marthey, S.</creatorcontrib><creatorcontrib>Chiapello, H.</creatorcontrib><creatorcontrib>Lebrun, M.-H.</creatorcontrib><creatorcontrib>Rodolphe, F.</creatorcontrib><creatorcontrib>Fournier, E.</creatorcontrib><creatorcontrib>Gendrault-Jacquemard, A.</creatorcontrib><creatorcontrib>Giraud, T.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>Systematic biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Aguileta, G.</au><au>Marthey, S.</au><au>Chiapello, H.</au><au>Lebrun, M.-H.</au><au>Rodolphe, F.</au><au>Fournier, E.</au><au>Gendrault-Jacquemard, A.</au><au>Giraud, T.</au><au>Ané, Cécile</au><au>Ané, Cécile</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Assessing the Performance of Single-Copy Genes for Recovering Robust Phylogenies</atitle><jtitle>Systematic biology</jtitle><addtitle>Syst Biol</addtitle><date>2008-08</date><risdate>2008</risdate><volume>57</volume><issue>4</issue><spage>613</spage><epage>627</epage><pages>613-627</pages><issn>1063-5157</issn><eissn>1076-836X</eissn><abstract>Phylogenies involving nonmodel species are based on a few genes, mostly chosen following historical or practical criteria. Because gene trees are sometimes incongruent with species trees, the resulting phylogenies may not accurately reflect the evolutionary relationships among species. The increase in availability of genome sequences now provides large numbers of genes that could be used for building phylogenies. However, for practical reasons only a few genes can be sequenced for a wide range of species. Here we asked whether we can identify a few genes, among the single-copy genes common to most fungal genomes, that are sufficient for recovering accurate and well-supported phylogenies. Fungi represent a model group for phylogenomics because many complete fungal genomes are available. An automated procedure was developed to extract single-copy orthologous genes from complete fungal genomes using a Markov Clustering Algorithm (Tribe-MCL). Using 21 complete, publicly available fungal genomes with reliable protein predictions, 246 single-copy orthologous gene clusters were identified. We inferred the maximum likelihood trees using the individual orthologous sequences and constructed a reference tree from concatenated protein alignments. The topologies of the individual gene trees were compared to that of the reference tree using three different methods. The performance of individual genes in recovering the reference tree was highly variable. Gene size and the number of variable sites were highly correlated and significantly affected the performance of the genes, but the average substitution rate did not. Two genes recovered exactly the same topology as the reference tree, and when concatenated provided high bootstrap values. The genes typically used for fungal phylogenies did not perform well, which suggests that current fungal phylogenies based on these genes may not accurately reflect the evolutionary relationships among species. Analyses on subsets of species showed that the phylogenetic performance did not seem to depend strongly on the sample. We expect that the best-performing genes identified here will be very useful for phylogenetic studies of fungi, at least at a large taxonomic scale. Furthermore, we compare the method developed here for finding genes for building robust phylogenies with previous ones and we advocate that our method could be applied to other groups of organisms when more complete genomes are available.</abstract><cop>England</cop><pub>Taylor &amp; Francis</pub><pmid>18709599</pmid><doi>10.1080/10635150802306527</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-2685-6478</orcidid><orcidid>https://orcid.org/0000-0003-1562-1902</orcidid><orcidid>https://orcid.org/0000-0001-6959-1855</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1063-5157
ispartof Systematic biology, 2008-08, Vol.57 (4), p.613-627
issn 1063-5157
1076-836X
language eng
recordid cdi_hal_primary_oai_HAL_hal_02333207v1
source MEDLINE; Jstor Complete Legacy; Oxford University Press Journals All Titles (1996-Current)
subjects Algorithms
Ascomycota
Basidiomycota
Biodiversity
Biological taxonomies
Classification - methods
Datasets
Evolution
Fungal genomes
Fungi
Fungi - classification
Fungi - genetics
FUNYBASE
Genes
Genes, Fungal - genetics
Genomes
Genomics
incongruence
Life Sciences
Likelihood Functions
Markov analysis
Multigene Family
multigene phylogenies
phylogenetic informativeness
Phylogenetics
Phylogeny
Populations and Evolution
Taxonomy
topological score
Topology
tree of life
Web site
title Assessing the Performance of Single-Copy Genes for Recovering Robust Phylogenies
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T01%3A02%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Assessing%20the%20Performance%20of%20Single-Copy%20Genes%20for%20Recovering%20Robust%20Phylogenies&rft.jtitle=Systematic%20biology&rft.au=Aguileta,%20G.&rft.date=2008-08&rft.volume=57&rft.issue=4&rft.spage=613&rft.epage=627&rft.pages=613-627&rft.issn=1063-5157&rft.eissn=1076-836X&rft_id=info:doi/10.1080/10635150802306527&rft_dat=%3Cjstor_hal_p%3E27756379%3C/jstor_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=194773862&rft_id=info:pmid/18709599&rft_jstor_id=27756379&rft_oup_id=10.1080/10635150802306527&rfr_iscdi=true