FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models

Abstract Motivation Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics (Oxford, England) England), 2020-07, Vol.36 (Supplement_1), p.i57-i65
Hauptverfasser: Molloy, Erin K, Warnow, Tandy
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page i65
container_issue Supplement_1
container_start_page i57
container_title Bioinformatics (Oxford, England)
container_volume 36
creator Molloy, Erin K
Warnow, Tandy
description Abstract Motivation Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed. Results We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods. Availability and impementation FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs). Supplementary information Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btaa444
format Article
fullrecord <record><control><sourceid>oup_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7355287</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btaa444</oup_id><sourcerecordid>10.1093/bioinformatics/btaa444</sourcerecordid><originalsourceid>FETCH-LOGICAL-c456t-32cd3a8c99ce40f59df0c89d56da0c5d38c09ba826d29c01847bdaefa8dec4d03</originalsourceid><addsrcrecordid>eNqNkF9LwzAUxYMobk6_wsgXmEuapm18EGQ4FSaCf55LenM7I11bklTw29vZOdybT_fCved3DoeQKWeXnCkxL2xj67JxGx0s-HkRtI7j-IiMuUjSWZxxfrzfmRiRM-8_GGOSyeSUjESUyFSoZEzMUvvw2FXPy5crWvY71bWhGqBzOiD1LYJFT4NDpOiD3fo1Ne1qg46usUZn4WdS07WVheG8ZVSN93TTGKz8OTkpdeXxYjcn5G15-7q4n62e7h4WN6sZxDIJMxGBEToDpQBjVkplSgaZMjIxmoE0IgOmCp1FiYkUMJ7FaWE0ljozCLFhYkKuB27bFRs0gHVwuspb18d2X3mjbX54qe17vm4-81RIGWVpD0gGALg-vcNyr-Us3_aeH_ae73rvhdO_znvZb9H9Ax8emq79L_Qbc-GauA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models</title><source>Oxford Journals Open Access Collection</source><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Molloy, Erin K ; Warnow, Tandy</creator><creatorcontrib>Molloy, Erin K ; Warnow, Tandy</creatorcontrib><description>Abstract Motivation Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed. Results We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods. Availability and impementation FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs). Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btaa444</identifier><identifier>PMID: 32657396</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Biometry ; Comparative and Functional Genomics ; Computer Simulation ; Gene Duplication ; Phylogeny</subject><ispartof>Bioinformatics (Oxford, England), 2020-07, Vol.36 (Supplement_1), p.i57-i65</ispartof><rights>The Author(s) 2020. Published by Oxford University Press. 2020</rights><rights>The Author(s) 2020. Published by Oxford University Press.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c456t-32cd3a8c99ce40f59df0c89d56da0c5d38c09ba826d29c01847bdaefa8dec4d03</citedby><cites>FETCH-LOGICAL-c456t-32cd3a8c99ce40f59df0c89d56da0c5d38c09ba826d29c01847bdaefa8dec4d03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355287/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355287/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,1598,27903,27904,53769,53771</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32657396$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Molloy, Erin K</creatorcontrib><creatorcontrib>Warnow, Tandy</creatorcontrib><title>FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models</title><title>Bioinformatics (Oxford, England)</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed. Results We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods. Availability and impementation FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs). Supplementary information Supplementary data are available at Bioinformatics online.</description><subject>Algorithms</subject><subject>Biometry</subject><subject>Comparative and Functional Genomics</subject><subject>Computer Simulation</subject><subject>Gene Duplication</subject><subject>Phylogeny</subject><issn>1367-4803</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNqNkF9LwzAUxYMobk6_wsgXmEuapm18EGQ4FSaCf55LenM7I11bklTw29vZOdybT_fCved3DoeQKWeXnCkxL2xj67JxGx0s-HkRtI7j-IiMuUjSWZxxfrzfmRiRM-8_GGOSyeSUjESUyFSoZEzMUvvw2FXPy5crWvY71bWhGqBzOiD1LYJFT4NDpOiD3fo1Ne1qg46usUZn4WdS07WVheG8ZVSN93TTGKz8OTkpdeXxYjcn5G15-7q4n62e7h4WN6sZxDIJMxGBEToDpQBjVkplSgaZMjIxmoE0IgOmCp1FiYkUMJ7FaWE0ljozCLFhYkKuB27bFRs0gHVwuspb18d2X3mjbX54qe17vm4-81RIGWVpD0gGALg-vcNyr-Us3_aeH_ae73rvhdO_znvZb9H9Ax8emq79L_Qbc-GauA</recordid><startdate>20200701</startdate><enddate>20200701</enddate><creator>Molloy, Erin K</creator><creator>Warnow, Tandy</creator><general>Oxford University Press</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>5PM</scope></search><sort><creationdate>20200701</creationdate><title>FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models</title><author>Molloy, Erin K ; Warnow, Tandy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c456t-32cd3a8c99ce40f59df0c89d56da0c5d38c09ba826d29c01847bdaefa8dec4d03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Biometry</topic><topic>Comparative and Functional Genomics</topic><topic>Computer Simulation</topic><topic>Gene Duplication</topic><topic>Phylogeny</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Molloy, Erin K</creatorcontrib><creatorcontrib>Warnow, Tandy</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics (Oxford, England)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Molloy, Erin K</au><au>Warnow, Tandy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models</atitle><jtitle>Bioinformatics (Oxford, England)</jtitle><addtitle>Bioinformatics</addtitle><date>2020-07-01</date><risdate>2020</risdate><volume>36</volume><issue>Supplement_1</issue><spage>i57</spage><epage>i65</epage><pages>i57-i65</pages><issn>1367-4803</issn><eissn>1367-4811</eissn><abstract>Abstract Motivation Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed. Results We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods. Availability and impementation FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs). Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>32657396</pmid><doi>10.1093/bioinformatics/btaa444</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics (Oxford, England), 2020-07, Vol.36 (Supplement_1), p.i57-i65
issn 1367-4803
1367-4811
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7355287
source Oxford Journals Open Access Collection; MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Alma/SFX Local Collection
subjects Algorithms
Biometry
Comparative and Functional Genomics
Computer Simulation
Gene Duplication
Phylogeny
title FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T03%3A07%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-oup_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FastMulRFS:%20fast%20and%20accurate%20species%20tree%20estimation%20under%20generic%20gene%20duplication%20and%20loss%20models&rft.jtitle=Bioinformatics%20(Oxford,%20England)&rft.au=Molloy,%20Erin%20K&rft.date=2020-07-01&rft.volume=36&rft.issue=Supplement_1&rft.spage=i57&rft.epage=i65&rft.pages=i57-i65&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btaa444&rft_dat=%3Coup_pubme%3E10.1093/bioinformatics/btaa444%3C/oup_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/32657396&rft_oup_id=10.1093/bioinformatics/btaa444&rfr_iscdi=true