The Influence of the Number of Tree Searches on Maximum Likelihood Inference in Phylogenomics

Abstract Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., 10) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multipl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Systematic biology 2024-10, Vol.73 (5), p.807-822
Hauptverfasser: Liu, Chao, Zhou, Xiaofan, Li, Yuanning, Hittinger, Chris Todd, Pan, Ronghui, Huang, Jinyan, Chen, Xue-xin, Rokas, Antonis, Chen, Yun, Shen, Xing-Xing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 822
container_issue 5
container_start_page 807
container_title Systematic biology
container_volume 73
creator Liu, Chao
Zhou, Xiaofan
Li, Yuanning
Hittinger, Chris Todd
Pan, Ronghui
Huang, Jinyan
Chen, Xue-xin
Rokas, Antonis
Chen, Yun
Shen, Xing-Xing
description Abstract Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., 10) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multiple tree searches aid ML phylogenetic inference remains poorly explored. Here, we found that a random starting tree was not as effective as the BioNJ and parsimony starting trees in inferring the ML gene tree and that RAxML-NG and PhyML were less sensitive to different starting trees than IQ-TREE. We then examined the effect of the number of tree searches on ML tree inference with IQ-TREE and RAxML-NG, by running 100 tree searches on 19,414 gene alignments from 15 animal, plant, and fungal phylogenomic datasets. We found that the number of tree searches substantially impacted the recovery of the best-of-100 ML gene tree topology among 100 searches for a given ML program. In addition, all of the concatenation-based trees were topologically identical if the number of tree searches was ≥10. Quartet-based ASTRAL trees inferred from 1 to 80 tree searches differed topologically from those inferred from 100 tree searches for 6/15 phylogenomic datasets. Finally, our simulations showed that gene alignments with lower difficulty scores had a higher chance of finding the best-of-100 gene tree topology and were more likely to yield the correct trees.
doi_str_mv 10.1093/sysbio/syae031
format Article
fullrecord <record><control><sourceid>proquest_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_2406411</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/sysbio/syae031</oup_id><sourcerecordid>3073234536</sourcerecordid><originalsourceid>FETCH-LOGICAL-c241t-20e02025e5f9e0580b6880bfba3bbfdb7c3d6cec0129b1a897a1d2dc666adeff3</originalsourceid><addsrcrecordid>eNqFkM1LwzAYxoMobk6vHqV40kNn0rRpe5Thx2B-gBO8SEjSNzbaNjNpwf33tm569fJ-8XseXh6EjgmeEpzTC7_20ti-CcCU7KAxwSkLM8pedoeZ0TAhSTpCB96_Y0wIS8g-GtEsj3G_jtHrsoRg3uiqg0ZBYHXQ9of7rpbghm3pAIInEE6V4APbBHfiy9RdHSzMB1SmtLYY5OB-5KYJHst1Zd-gsbVR_hDtaVF5ONr2CXq-vlrObsPFw818drkIVRSTNoww4AhHCSQ6B5xkWLKsL1oKKqUuZKpowRQoTKJcEpHlqSBFVCjGmChAazpBpxtf61vDvTItqFLZpgHV8ijGLCakh8420MrZzw58y2vjFVSVaMB2nlOc0ojGCWU9Ot2gylnvHWi-cqYWbs0J5kPufJM73-beC0623p2sofjDf4PugfPth93qP7Nv_DuPtQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3073234536</pqid></control><display><type>article</type><title>The Influence of the Number of Tree Searches on Maximum Likelihood Inference in Phylogenomics</title><source>Oxford University Press Journals All Titles (1996-Current)</source><creator>Liu, Chao ; Zhou, Xiaofan ; Li, Yuanning ; Hittinger, Chris Todd ; Pan, Ronghui ; Huang, Jinyan ; Chen, Xue-xin ; Rokas, Antonis ; Chen, Yun ; Shen, Xing-Xing</creator><contributor>Gascuel, Olivier</contributor><creatorcontrib>Liu, Chao ; Zhou, Xiaofan ; Li, Yuanning ; Hittinger, Chris Todd ; Pan, Ronghui ; Huang, Jinyan ; Chen, Xue-xin ; Rokas, Antonis ; Chen, Yun ; Shen, Xing-Xing ; Great Lakes Bioenergy Research Center (GLBRC), Madison, WI (United States) ; Gascuel, Olivier</creatorcontrib><description>Abstract Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., 10) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multiple tree searches aid ML phylogenetic inference remains poorly explored. Here, we found that a random starting tree was not as effective as the BioNJ and parsimony starting trees in inferring the ML gene tree and that RAxML-NG and PhyML were less sensitive to different starting trees than IQ-TREE. We then examined the effect of the number of tree searches on ML tree inference with IQ-TREE and RAxML-NG, by running 100 tree searches on 19,414 gene alignments from 15 animal, plant, and fungal phylogenomic datasets. We found that the number of tree searches substantially impacted the recovery of the best-of-100 ML gene tree topology among 100 searches for a given ML program. In addition, all of the concatenation-based trees were topologically identical if the number of tree searches was ≥10. Quartet-based ASTRAL trees inferred from 1 to 80 tree searches differed topologically from those inferred from 100 tree searches for 6/15 phylogenomic datasets. Finally, our simulations showed that gene alignments with lower difficulty scores had a higher chance of finding the best-of-100 gene tree topology and were more likely to yield the correct trees.</description><identifier>ISSN: 1063-5157</identifier><identifier>ISSN: 1076-836X</identifier><identifier>EISSN: 1076-836X</identifier><identifier>DOI: 10.1093/sysbio/syae031</identifier><identifier>PMID: 38940001</identifier><language>eng</language><publisher>US: Oxford University Press</publisher><subject>Heuristic tree search ; hill-climbing ; local optima ; maximum likelihood ; phylogenomics ; species tree estimation</subject><ispartof>Systematic biology, 2024-10, Vol.73 (5), p.807-822</ispartof><rights>The Author(s) 2024. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com. 2024</rights><rights>The Author(s) 2024. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c241t-20e02025e5f9e0580b6880bfba3bbfdb7c3d6cec0129b1a897a1d2dc666adeff3</cites><orcidid>0000-0002-2206-5804 ; 0000-0001-5765-1419 ; 0000-0001-5088-7461 ; 0000-0002-7248-6551 ; 0000-0002-2879-6317 ; 0000-0002-9109-8853 ; 0000-0002-5663-2352 ; 0000000256632352 ; 0000000222065804 ; 0000000228796317 ; 0000000157651419 ; 0000000150887461 ; 0000000291098853 ; 0000000272486551</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,778,782,883,1581,27911,27912</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38940001$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/biblio/2406411$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><contributor>Gascuel, Olivier</contributor><creatorcontrib>Liu, Chao</creatorcontrib><creatorcontrib>Zhou, Xiaofan</creatorcontrib><creatorcontrib>Li, Yuanning</creatorcontrib><creatorcontrib>Hittinger, Chris Todd</creatorcontrib><creatorcontrib>Pan, Ronghui</creatorcontrib><creatorcontrib>Huang, Jinyan</creatorcontrib><creatorcontrib>Chen, Xue-xin</creatorcontrib><creatorcontrib>Rokas, Antonis</creatorcontrib><creatorcontrib>Chen, Yun</creatorcontrib><creatorcontrib>Shen, Xing-Xing</creatorcontrib><creatorcontrib>Great Lakes Bioenergy Research Center (GLBRC), Madison, WI (United States)</creatorcontrib><title>The Influence of the Number of Tree Searches on Maximum Likelihood Inference in Phylogenomics</title><title>Systematic biology</title><addtitle>Syst Biol</addtitle><description>Abstract Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., 10) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multiple tree searches aid ML phylogenetic inference remains poorly explored. Here, we found that a random starting tree was not as effective as the BioNJ and parsimony starting trees in inferring the ML gene tree and that RAxML-NG and PhyML were less sensitive to different starting trees than IQ-TREE. We then examined the effect of the number of tree searches on ML tree inference with IQ-TREE and RAxML-NG, by running 100 tree searches on 19,414 gene alignments from 15 animal, plant, and fungal phylogenomic datasets. We found that the number of tree searches substantially impacted the recovery of the best-of-100 ML gene tree topology among 100 searches for a given ML program. In addition, all of the concatenation-based trees were topologically identical if the number of tree searches was ≥10. Quartet-based ASTRAL trees inferred from 1 to 80 tree searches differed topologically from those inferred from 100 tree searches for 6/15 phylogenomic datasets. Finally, our simulations showed that gene alignments with lower difficulty scores had a higher chance of finding the best-of-100 gene tree topology and were more likely to yield the correct trees.</description><subject>Heuristic tree search</subject><subject>hill-climbing</subject><subject>local optima</subject><subject>maximum likelihood</subject><subject>phylogenomics</subject><subject>species tree estimation</subject><issn>1063-5157</issn><issn>1076-836X</issn><issn>1076-836X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqFkM1LwzAYxoMobk6vHqV40kNn0rRpe5Thx2B-gBO8SEjSNzbaNjNpwf33tm569fJ-8XseXh6EjgmeEpzTC7_20ti-CcCU7KAxwSkLM8pedoeZ0TAhSTpCB96_Y0wIS8g-GtEsj3G_jtHrsoRg3uiqg0ZBYHXQ9of7rpbghm3pAIInEE6V4APbBHfiy9RdHSzMB1SmtLYY5OB-5KYJHst1Zd-gsbVR_hDtaVF5ONr2CXq-vlrObsPFw818drkIVRSTNoww4AhHCSQ6B5xkWLKsL1oKKqUuZKpowRQoTKJcEpHlqSBFVCjGmChAazpBpxtf61vDvTItqFLZpgHV8ijGLCakh8420MrZzw58y2vjFVSVaMB2nlOc0ojGCWU9Ot2gylnvHWi-cqYWbs0J5kPufJM73-beC0623p2sofjDf4PugfPth93qP7Nv_DuPtQ</recordid><startdate>20241030</startdate><enddate>20241030</enddate><creator>Liu, Chao</creator><creator>Zhou, Xiaofan</creator><creator>Li, Yuanning</creator><creator>Hittinger, Chris Todd</creator><creator>Pan, Ronghui</creator><creator>Huang, Jinyan</creator><creator>Chen, Xue-xin</creator><creator>Rokas, Antonis</creator><creator>Chen, Yun</creator><creator>Shen, Xing-Xing</creator><general>Oxford University Press</general><general>Society of Systematic Biologists - Oxford University Press</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000-0002-2206-5804</orcidid><orcidid>https://orcid.org/0000-0001-5765-1419</orcidid><orcidid>https://orcid.org/0000-0001-5088-7461</orcidid><orcidid>https://orcid.org/0000-0002-7248-6551</orcidid><orcidid>https://orcid.org/0000-0002-2879-6317</orcidid><orcidid>https://orcid.org/0000-0002-9109-8853</orcidid><orcidid>https://orcid.org/0000-0002-5663-2352</orcidid><orcidid>https://orcid.org/0000000256632352</orcidid><orcidid>https://orcid.org/0000000222065804</orcidid><orcidid>https://orcid.org/0000000228796317</orcidid><orcidid>https://orcid.org/0000000157651419</orcidid><orcidid>https://orcid.org/0000000150887461</orcidid><orcidid>https://orcid.org/0000000291098853</orcidid><orcidid>https://orcid.org/0000000272486551</orcidid></search><sort><creationdate>20241030</creationdate><title>The Influence of the Number of Tree Searches on Maximum Likelihood Inference in Phylogenomics</title><author>Liu, Chao ; Zhou, Xiaofan ; Li, Yuanning ; Hittinger, Chris Todd ; Pan, Ronghui ; Huang, Jinyan ; Chen, Xue-xin ; Rokas, Antonis ; Chen, Yun ; Shen, Xing-Xing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c241t-20e02025e5f9e0580b6880bfba3bbfdb7c3d6cec0129b1a897a1d2dc666adeff3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Heuristic tree search</topic><topic>hill-climbing</topic><topic>local optima</topic><topic>maximum likelihood</topic><topic>phylogenomics</topic><topic>species tree estimation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Chao</creatorcontrib><creatorcontrib>Zhou, Xiaofan</creatorcontrib><creatorcontrib>Li, Yuanning</creatorcontrib><creatorcontrib>Hittinger, Chris Todd</creatorcontrib><creatorcontrib>Pan, Ronghui</creatorcontrib><creatorcontrib>Huang, Jinyan</creatorcontrib><creatorcontrib>Chen, Xue-xin</creatorcontrib><creatorcontrib>Rokas, Antonis</creatorcontrib><creatorcontrib>Chen, Yun</creatorcontrib><creatorcontrib>Shen, Xing-Xing</creatorcontrib><creatorcontrib>Great Lakes Bioenergy Research Center (GLBRC), Madison, WI (United States)</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV</collection><jtitle>Systematic biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Chao</au><au>Zhou, Xiaofan</au><au>Li, Yuanning</au><au>Hittinger, Chris Todd</au><au>Pan, Ronghui</au><au>Huang, Jinyan</au><au>Chen, Xue-xin</au><au>Rokas, Antonis</au><au>Chen, Yun</au><au>Shen, Xing-Xing</au><au>Gascuel, Olivier</au><aucorp>Great Lakes Bioenergy Research Center (GLBRC), Madison, WI (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The Influence of the Number of Tree Searches on Maximum Likelihood Inference in Phylogenomics</atitle><jtitle>Systematic biology</jtitle><addtitle>Syst Biol</addtitle><date>2024-10-30</date><risdate>2024</risdate><volume>73</volume><issue>5</issue><spage>807</spage><epage>822</epage><pages>807-822</pages><issn>1063-5157</issn><issn>1076-836X</issn><eissn>1076-836X</eissn><abstract>Abstract Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., 10) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multiple tree searches aid ML phylogenetic inference remains poorly explored. Here, we found that a random starting tree was not as effective as the BioNJ and parsimony starting trees in inferring the ML gene tree and that RAxML-NG and PhyML were less sensitive to different starting trees than IQ-TREE. We then examined the effect of the number of tree searches on ML tree inference with IQ-TREE and RAxML-NG, by running 100 tree searches on 19,414 gene alignments from 15 animal, plant, and fungal phylogenomic datasets. We found that the number of tree searches substantially impacted the recovery of the best-of-100 ML gene tree topology among 100 searches for a given ML program. In addition, all of the concatenation-based trees were topologically identical if the number of tree searches was ≥10. Quartet-based ASTRAL trees inferred from 1 to 80 tree searches differed topologically from those inferred from 100 tree searches for 6/15 phylogenomic datasets. Finally, our simulations showed that gene alignments with lower difficulty scores had a higher chance of finding the best-of-100 gene tree topology and were more likely to yield the correct trees.</abstract><cop>US</cop><pub>Oxford University Press</pub><pmid>38940001</pmid><doi>10.1093/sysbio/syae031</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-2206-5804</orcidid><orcidid>https://orcid.org/0000-0001-5765-1419</orcidid><orcidid>https://orcid.org/0000-0001-5088-7461</orcidid><orcidid>https://orcid.org/0000-0002-7248-6551</orcidid><orcidid>https://orcid.org/0000-0002-2879-6317</orcidid><orcidid>https://orcid.org/0000-0002-9109-8853</orcidid><orcidid>https://orcid.org/0000-0002-5663-2352</orcidid><orcidid>https://orcid.org/0000000256632352</orcidid><orcidid>https://orcid.org/0000000222065804</orcidid><orcidid>https://orcid.org/0000000228796317</orcidid><orcidid>https://orcid.org/0000000157651419</orcidid><orcidid>https://orcid.org/0000000150887461</orcidid><orcidid>https://orcid.org/0000000291098853</orcidid><orcidid>https://orcid.org/0000000272486551</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1063-5157
ispartof Systematic biology, 2024-10, Vol.73 (5), p.807-822
issn 1063-5157
1076-836X
1076-836X
language eng
recordid cdi_osti_scitechconnect_2406411
source Oxford University Press Journals All Titles (1996-Current)
subjects Heuristic tree search
hill-climbing
local optima
maximum likelihood
phylogenomics
species tree estimation
title The Influence of the Number of Tree Searches on Maximum Likelihood Inference in Phylogenomics
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T21%3A16%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20Influence%20of%20the%20Number%20of%20Tree%20Searches%20on%20Maximum%20Likelihood%20Inference%20in%20Phylogenomics&rft.jtitle=Systematic%20biology&rft.au=Liu,%20Chao&rft.aucorp=Great%20Lakes%20Bioenergy%20Research%20Center%20(GLBRC),%20Madison,%20WI%20(United%20States)&rft.date=2024-10-30&rft.volume=73&rft.issue=5&rft.spage=807&rft.epage=822&rft.pages=807-822&rft.issn=1063-5157&rft.eissn=1076-836X&rft_id=info:doi/10.1093/sysbio/syae031&rft_dat=%3Cproquest_osti_%3E3073234536%3C/proquest_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3073234536&rft_id=info:pmid/38940001&rft_oup_id=10.1093/sysbio/syae031&rfr_iscdi=true