FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models
Abstract Motivation Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or...
Gespeichert in:
Veröffentlicht in: | Bioinformatics (Oxford, England) England), 2020-07, Vol.36 (Supplement_1), p.i57-i65 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | i65 |
---|---|
container_issue | Supplement_1 |
container_start_page | i57 |
container_title | Bioinformatics (Oxford, England) |
container_volume | 36 |
creator | Molloy, Erin K Warnow, Tandy |
description | Abstract
Motivation
Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed.
Results
We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods.
Availability and impementation
FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs).
Supplementary information
Supplementary data are available at Bioinformatics online. |
doi_str_mv | 10.1093/bioinformatics/btaa444 |
format | Article |
fullrecord | <record><control><sourceid>oup_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7355287</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btaa444</oup_id><sourcerecordid>10.1093/bioinformatics/btaa444</sourcerecordid><originalsourceid>FETCH-LOGICAL-c456t-32cd3a8c99ce40f59df0c89d56da0c5d38c09ba826d29c01847bdaefa8dec4d03</originalsourceid><addsrcrecordid>eNqNkF9LwzAUxYMobk6_wsgXmEuapm18EGQ4FSaCf55LenM7I11bklTw29vZOdybT_fCved3DoeQKWeXnCkxL2xj67JxGx0s-HkRtI7j-IiMuUjSWZxxfrzfmRiRM-8_GGOSyeSUjESUyFSoZEzMUvvw2FXPy5crWvY71bWhGqBzOiD1LYJFT4NDpOiD3fo1Ne1qg46usUZn4WdS07WVheG8ZVSN93TTGKz8OTkpdeXxYjcn5G15-7q4n62e7h4WN6sZxDIJMxGBEToDpQBjVkplSgaZMjIxmoE0IgOmCp1FiYkUMJ7FaWE0ljozCLFhYkKuB27bFRs0gHVwuspb18d2X3mjbX54qe17vm4-81RIGWVpD0gGALg-vcNyr-Us3_aeH_ae73rvhdO_znvZb9H9Ax8emq79L_Qbc-GauA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models</title><source>Oxford Journals Open Access Collection</source><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Molloy, Erin K ; Warnow, Tandy</creator><creatorcontrib>Molloy, Erin K ; Warnow, Tandy</creatorcontrib><description>Abstract
Motivation
Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed.
Results
We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods.
Availability and impementation
FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs).
Supplementary information
Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btaa444</identifier><identifier>PMID: 32657396</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Biometry ; Comparative and Functional Genomics ; Computer Simulation ; Gene Duplication ; Phylogeny</subject><ispartof>Bioinformatics (Oxford, England), 2020-07, Vol.36 (Supplement_1), p.i57-i65</ispartof><rights>The Author(s) 2020. Published by Oxford University Press. 2020</rights><rights>The Author(s) 2020. Published by Oxford University Press.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c456t-32cd3a8c99ce40f59df0c89d56da0c5d38c09ba826d29c01847bdaefa8dec4d03</citedby><cites>FETCH-LOGICAL-c456t-32cd3a8c99ce40f59df0c89d56da0c5d38c09ba826d29c01847bdaefa8dec4d03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355287/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355287/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,1598,27903,27904,53769,53771</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32657396$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Molloy, Erin K</creatorcontrib><creatorcontrib>Warnow, Tandy</creatorcontrib><title>FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models</title><title>Bioinformatics (Oxford, England)</title><addtitle>Bioinformatics</addtitle><description>Abstract
Motivation
Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed.
Results
We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods.
Availability and impementation
FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs).
Supplementary information
Supplementary data are available at Bioinformatics online.</description><subject>Algorithms</subject><subject>Biometry</subject><subject>Comparative and Functional Genomics</subject><subject>Computer Simulation</subject><subject>Gene Duplication</subject><subject>Phylogeny</subject><issn>1367-4803</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNqNkF9LwzAUxYMobk6_wsgXmEuapm18EGQ4FSaCf55LenM7I11bklTw29vZOdybT_fCved3DoeQKWeXnCkxL2xj67JxGx0s-HkRtI7j-IiMuUjSWZxxfrzfmRiRM-8_GGOSyeSUjESUyFSoZEzMUvvw2FXPy5crWvY71bWhGqBzOiD1LYJFT4NDpOiD3fo1Ne1qg46usUZn4WdS07WVheG8ZVSN93TTGKz8OTkpdeXxYjcn5G15-7q4n62e7h4WN6sZxDIJMxGBEToDpQBjVkplSgaZMjIxmoE0IgOmCp1FiYkUMJ7FaWE0ljozCLFhYkKuB27bFRs0gHVwuspb18d2X3mjbX54qe17vm4-81RIGWVpD0gGALg-vcNyr-Us3_aeH_ae73rvhdO_znvZb9H9Ax8emq79L_Qbc-GauA</recordid><startdate>20200701</startdate><enddate>20200701</enddate><creator>Molloy, Erin K</creator><creator>Warnow, Tandy</creator><general>Oxford University Press</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>5PM</scope></search><sort><creationdate>20200701</creationdate><title>FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models</title><author>Molloy, Erin K ; Warnow, Tandy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c456t-32cd3a8c99ce40f59df0c89d56da0c5d38c09ba826d29c01847bdaefa8dec4d03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Biometry</topic><topic>Comparative and Functional Genomics</topic><topic>Computer Simulation</topic><topic>Gene Duplication</topic><topic>Phylogeny</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Molloy, Erin K</creatorcontrib><creatorcontrib>Warnow, Tandy</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics (Oxford, England)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Molloy, Erin K</au><au>Warnow, Tandy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models</atitle><jtitle>Bioinformatics (Oxford, England)</jtitle><addtitle>Bioinformatics</addtitle><date>2020-07-01</date><risdate>2020</risdate><volume>36</volume><issue>Supplement_1</issue><spage>i57</spage><epage>i65</epage><pages>i57-i65</pages><issn>1367-4803</issn><eissn>1367-4811</eissn><abstract>Abstract
Motivation
Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed.
Results
We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods.
Availability and impementation
FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs).
Supplementary information
Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>32657396</pmid><doi>10.1093/bioinformatics/btaa444</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1367-4803 |
ispartof | Bioinformatics (Oxford, England), 2020-07, Vol.36 (Supplement_1), p.i57-i65 |
issn | 1367-4803 1367-4811 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7355287 |
source | Oxford Journals Open Access Collection; MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Alma/SFX Local Collection |
subjects | Algorithms Biometry Comparative and Functional Genomics Computer Simulation Gene Duplication Phylogeny |
title | FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T03%3A07%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-oup_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FastMulRFS:%20fast%20and%20accurate%20species%20tree%20estimation%20under%20generic%20gene%20duplication%20and%20loss%20models&rft.jtitle=Bioinformatics%20(Oxford,%20England)&rft.au=Molloy,%20Erin%20K&rft.date=2020-07-01&rft.volume=36&rft.issue=Supplement_1&rft.spage=i57&rft.epage=i65&rft.pages=i57-i65&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btaa444&rft_dat=%3Coup_pubme%3E10.1093/bioinformatics/btaa444%3C/oup_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/32657396&rft_oup_id=10.1093/bioinformatics/btaa444&rfr_iscdi=true |