Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased

[Display omitted] •Genes with minimal phylogenetic information can compromise gene tree estimation.•Gene tree estimation using PhyML can be biased toward one particular topology.•The multilocus bootstrapping approach is important for species tree estimation.•No incongruence is identified in coalesce...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular phylogenetics and evolution 2015-11, Vol.92, p.63-71
Hauptverfasser: Xi, Zhenxiang, Liu, Liang, Davis, Charles C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 71
container_issue
container_start_page 63
container_title Molecular phylogenetics and evolution
container_volume 92
creator Xi, Zhenxiang
Liu, Liang
Davis, Charles C.
description [Display omitted] •Genes with minimal phylogenetic information can compromise gene tree estimation.•Gene tree estimation using PhyML can be biased toward one particular topology.•The multilocus bootstrapping approach is important for species tree estimation.•No incongruence is identified in coalescent analyses of mammalian phylogeny. The development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014).
doi_str_mv 10.1016/j.ympev.2015.06.009
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1716935129</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1055790315001797</els_id><sourcerecordid>1716935129</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-c98288c2b6d0c9f91008049c9b3990dab932fcc02bcc3608b0ccc9e053d4ee543</originalsourceid><addsrcrecordid>eNp9kE9v1DAQxa0K1H_wCZCQj1wSxnGcZg4cUFVKpUpcytlyJhPWq8RZ7Gyr_fZ12IUjJ49m3rx5_gnxQUGpQDWft-Vh2vFzWYEyJTQlAJ6JSwVoCjRKv1lrY4obBH0hrlLaAihl0JyLi6rJVVvXl-LlngMn-eKXjZx88JMb5W5zGOdfub94kj4Mc5zc4ucgXWS5i3M38togmSeSZjdyIg6LdMGNh7S6bTjI1UAukVlyWvzJwSfZeZe4fyfeDm5M_P70Xouf3-6ebr8Xjz_uH26_PhZUAy4FYVu1LVVd0wPhgAqghRoJO40IvetQVwMRVB2RbqDtgIiQwei-Zja1vhafjr459-99TmInn9OOows875NVN6pBbVSFWaqPUopzSpEHu4s5dzxYBXYlbrf2D3G7ErfQ2Ew8b308Hdh3E_f_dv4izoIvRwHnbz57jjaR50Dc-8i02H72_z3wCv5pleQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1716935129</pqid></control><display><type>article</type><title>Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Xi, Zhenxiang ; Liu, Liang ; Davis, Charles C.</creator><creatorcontrib>Xi, Zhenxiang ; Liu, Liang ; Davis, Charles C.</creatorcontrib><description>[Display omitted] •Genes with minimal phylogenetic information can compromise gene tree estimation.•Gene tree estimation using PhyML can be biased toward one particular topology.•The multilocus bootstrapping approach is important for species tree estimation.•No incongruence is identified in coalescent analyses of mammalian phylogeny. The development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014).</description><identifier>ISSN: 1055-7903</identifier><identifier>EISSN: 1095-9513</identifier><identifier>DOI: 10.1016/j.ympev.2015.06.009</identifier><identifier>PMID: 26115844</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Animals ; Base Sequence ; Concatenation methods ; Datasets as Topic ; DNA - genetics ; Gene informativeness ; Gene tree estimation ; Gene-tree-based coalescent methods ; Genes ; Mammals - classification ; Mammals - genetics ; Multilocus bootstrap approach ; Phylogeny ; PhyML ; Software</subject><ispartof>Molecular phylogenetics and evolution, 2015-11, Vol.92, p.63-71</ispartof><rights>2015 Elsevier Inc.</rights><rights>Copyright © 2015 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-c98288c2b6d0c9f91008049c9b3990dab932fcc02bcc3608b0ccc9e053d4ee543</citedby><cites>FETCH-LOGICAL-c409t-c98288c2b6d0c9f91008049c9b3990dab932fcc02bcc3608b0ccc9e053d4ee543</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1055790315001797$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26115844$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Xi, Zhenxiang</creatorcontrib><creatorcontrib>Liu, Liang</creatorcontrib><creatorcontrib>Davis, Charles C.</creatorcontrib><title>Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased</title><title>Molecular phylogenetics and evolution</title><addtitle>Mol Phylogenet Evol</addtitle><description>[Display omitted] •Genes with minimal phylogenetic information can compromise gene tree estimation.•Gene tree estimation using PhyML can be biased toward one particular topology.•The multilocus bootstrapping approach is important for species tree estimation.•No incongruence is identified in coalescent analyses of mammalian phylogeny. The development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014).</description><subject>Animals</subject><subject>Base Sequence</subject><subject>Concatenation methods</subject><subject>Datasets as Topic</subject><subject>DNA - genetics</subject><subject>Gene informativeness</subject><subject>Gene tree estimation</subject><subject>Gene-tree-based coalescent methods</subject><subject>Genes</subject><subject>Mammals - classification</subject><subject>Mammals - genetics</subject><subject>Multilocus bootstrap approach</subject><subject>Phylogeny</subject><subject>PhyML</subject><subject>Software</subject><issn>1055-7903</issn><issn>1095-9513</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kE9v1DAQxa0K1H_wCZCQj1wSxnGcZg4cUFVKpUpcytlyJhPWq8RZ7Gyr_fZ12IUjJ49m3rx5_gnxQUGpQDWft-Vh2vFzWYEyJTQlAJ6JSwVoCjRKv1lrY4obBH0hrlLaAihl0JyLi6rJVVvXl-LlngMn-eKXjZx88JMb5W5zGOdfub94kj4Mc5zc4ucgXWS5i3M38togmSeSZjdyIg6LdMGNh7S6bTjI1UAukVlyWvzJwSfZeZe4fyfeDm5M_P70Xouf3-6ebr8Xjz_uH26_PhZUAy4FYVu1LVVd0wPhgAqghRoJO40IvetQVwMRVB2RbqDtgIiQwei-Zja1vhafjr459-99TmInn9OOows875NVN6pBbVSFWaqPUopzSpEHu4s5dzxYBXYlbrf2D3G7ErfQ2Ew8b308Hdh3E_f_dv4izoIvRwHnbz57jjaR50Dc-8i02H72_z3wCv5pleQ</recordid><startdate>201511</startdate><enddate>201511</enddate><creator>Xi, Zhenxiang</creator><creator>Liu, Liang</creator><creator>Davis, Charles C.</creator><general>Elsevier Inc</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>201511</creationdate><title>Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased</title><author>Xi, Zhenxiang ; Liu, Liang ; Davis, Charles C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-c98288c2b6d0c9f91008049c9b3990dab932fcc02bcc3608b0ccc9e053d4ee543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Animals</topic><topic>Base Sequence</topic><topic>Concatenation methods</topic><topic>Datasets as Topic</topic><topic>DNA - genetics</topic><topic>Gene informativeness</topic><topic>Gene tree estimation</topic><topic>Gene-tree-based coalescent methods</topic><topic>Genes</topic><topic>Mammals - classification</topic><topic>Mammals - genetics</topic><topic>Multilocus bootstrap approach</topic><topic>Phylogeny</topic><topic>PhyML</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xi, Zhenxiang</creatorcontrib><creatorcontrib>Liu, Liang</creatorcontrib><creatorcontrib>Davis, Charles C.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Molecular phylogenetics and evolution</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xi, Zhenxiang</au><au>Liu, Liang</au><au>Davis, Charles C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased</atitle><jtitle>Molecular phylogenetics and evolution</jtitle><addtitle>Mol Phylogenet Evol</addtitle><date>2015-11</date><risdate>2015</risdate><volume>92</volume><spage>63</spage><epage>71</epage><pages>63-71</pages><issn>1055-7903</issn><eissn>1095-9513</eissn><abstract>[Display omitted] •Genes with minimal phylogenetic information can compromise gene tree estimation.•Gene tree estimation using PhyML can be biased toward one particular topology.•The multilocus bootstrapping approach is important for species tree estimation.•No incongruence is identified in coalescent analyses of mammalian phylogeny. The development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014).</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>26115844</pmid><doi>10.1016/j.ympev.2015.06.009</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1055-7903
ispartof Molecular phylogenetics and evolution, 2015-11, Vol.92, p.63-71
issn 1055-7903
1095-9513
language eng
recordid cdi_proquest_miscellaneous_1716935129
source MEDLINE; Elsevier ScienceDirect Journals
subjects Animals
Base Sequence
Concatenation methods
Datasets as Topic
DNA - genetics
Gene informativeness
Gene tree estimation
Gene-tree-based coalescent methods
Genes
Mammals - classification
Mammals - genetics
Multilocus bootstrap approach
Phylogeny
PhyML
Software
title Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T11%3A04%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Genes%20with%20minimal%20phylogenetic%20information%20are%20problematic%20for%20coalescent%20analyses%20when%20gene%20tree%20estimation%20is%20biased&rft.jtitle=Molecular%20phylogenetics%20and%20evolution&rft.au=Xi,%20Zhenxiang&rft.date=2015-11&rft.volume=92&rft.spage=63&rft.epage=71&rft.pages=63-71&rft.issn=1055-7903&rft.eissn=1095-9513&rft_id=info:doi/10.1016/j.ympev.2015.06.009&rft_dat=%3Cproquest_cross%3E1716935129%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1716935129&rft_id=info:pmid/26115844&rft_els_id=S1055790315001797&rfr_iscdi=true