Combinatorics of distance-based tree inference
Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible t...
Gespeichert in:
Veröffentlicht in: | Proceedings of the National Academy of Sciences - PNAS 2012-10, Vol.109 (41), p.16443-16448 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 16448 |
---|---|
container_issue | 41 |
container_start_page | 16443 |
container_title | Proceedings of the National Academy of Sciences - PNAS |
container_volume | 109 |
creator | Pardi, Fabio Gascuel, Olivier |
description | Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible to the input distances. If we hold the structure (topology) of the tree fixed, in some relevant cases (e.g., ordinary least squares) the optimal values for the branch lengths can be expressed using simple combinatorial formulae. Here we define a general form for these formulae and show that they all have two desirable properties: First, the common tree reconstruction approaches (least squares, minimum evolution), when used in combination with these formulae, are guaranteed to infer the correct tree when given enough data (consistency); second, the branch lengths of all the simple (nearest neighbor interchange) rearrangements of a tree can be calculated, optimally, in quadratic time in the size of the tree, thus allowing the efficient application of hill climbing heuristics. The study presented here is a continuation of that by Mihaescu and Pachter on branch length estimation [Mihaescu R, Pachter L (2008) Proc Natl Acad Sei USA 105:13206-13211]. The focus here is on the inference of the tree itself and on providing a basis for novel algorithms to reconstruct trees from distances. |
doi_str_mv | 10.1073/pnas.1118368109 |
format | Article |
fullrecord | <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_journals_1095668972</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>41763367</jstor_id><sourcerecordid>41763367</sourcerecordid><originalsourceid>FETCH-LOGICAL-c504t-40df0cb5f0dea485dd01782ac3b9d2fbdb0255e56a53180474edcc4eeb6602e3</originalsourceid><addsrcrecordid>eNpdkc9rFDEUx4Modq2ePSkDXgSZ7cuPSTIXoSxqhQUvvYdM8sZmmZmsyWyh_70Zd91qLwnkfd4nyfsS8pbCmoLiV_vJ5jWlVHOpKbTPyKqstJaihedkBcBUrQUTF-RVzjsAaBsNL8kF40CZAL4i600cuzDZOabgchX7yoc828lh3dmMvpoTYhWmHhOWw9fkRW-HjG9O-yW5_frldnNTb398-7653tauATHXAnwPrmt68GiFbrwHqjSzjnetZ33nO2BNg420DacahBLonROInZTAkF-Sz0ft_tCNpYTTnOxg9imMNj2YaIP5vzKFO_Mz3hsulJZMFcGno-DuSdvN9dYMIY2jAVBMcknvWaE_nq5L8dcB82zGkB0Og50wHrKhy9w4MLGIPzxBd_GQpjKLP5SUulWL8OpIuRRzTtif30DBLMGZJTjzGFzpeP_vj8_836QKUJ2ApfNR1xpBDZVCLMi7I7LLJc4zI6iSnEvFfwPTOadX</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1095668972</pqid></control><display><type>article</type><title>Combinatorics of distance-based tree inference</title><source>MEDLINE</source><source>JSTOR Archive Collection A-Z Listing</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Pardi, Fabio ; Gascuel, Olivier</creator><creatorcontrib>Pardi, Fabio ; Gascuel, Olivier</creatorcontrib><description>Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible to the input distances. If we hold the structure (topology) of the tree fixed, in some relevant cases (e.g., ordinary least squares) the optimal values for the branch lengths can be expressed using simple combinatorial formulae. Here we define a general form for these formulae and show that they all have two desirable properties: First, the common tree reconstruction approaches (least squares, minimum evolution), when used in combination with these formulae, are guaranteed to infer the correct tree when given enough data (consistency); second, the branch lengths of all the simple (nearest neighbor interchange) rearrangements of a tree can be calculated, optimally, in quadratic time in the size of the tree, thus allowing the efficient application of hill climbing heuristics. The study presented here is a continuation of that by Mihaescu and Pachter on branch length estimation [Mihaescu R, Pachter L (2008) Proc Natl Acad Sei USA 105:13206-13211]. The focus here is on the inference of the tree itself and on providing a basis for novel algorithms to reconstruct trees from distances.</description><identifier>ISSN: 0027-8424</identifier><identifier>EISSN: 1091-6490</identifier><identifier>DOI: 10.1073/pnas.1118368109</identifier><identifier>PMID: 23012403</identifier><language>eng</language><publisher>United States: National Academy of Sciences</publisher><subject>Algorithms ; Animals ; Bioinformatics ; Biological Sciences ; Branches ; Cluster Analysis ; Combinatorics ; Computational Biology - methods ; Computer Science ; Computer Simulation ; Estimation methods ; Evolution, Molecular ; Gene expression ; Genes ; Heuristic ; Humans ; Inference ; Least squares ; Leaves ; Life Sciences ; Models, Genetic ; Molecular evolution ; Phylogenetics ; Phylogeny ; Physical Sciences ; Quantitative Methods ; Reproducibility of Results ; Statistical discrepancies ; Taxa ; Topology ; Trees</subject><ispartof>Proceedings of the National Academy of Sciences - PNAS, 2012-10, Vol.109 (41), p.16443-16448</ispartof><rights>copyright © 1993-2008 National Academy of Sciences of the United States of America</rights><rights>Copyright National Academy of Sciences Oct 9, 2012</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c504t-40df0cb5f0dea485dd01782ac3b9d2fbdb0255e56a53180474edcc4eeb6602e3</citedby><cites>FETCH-LOGICAL-c504t-40df0cb5f0dea485dd01782ac3b9d2fbdb0255e56a53180474edcc4eeb6602e3</cites><orcidid>0000-0001-8084-1464 ; 0000-0002-9412-9723</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://www.pnas.org/content/109/41.cover.gif</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/41763367$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/41763367$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,727,780,784,803,885,27922,27923,53789,53791,58015,58248</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23012403$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://hal-lirmm.ccsd.cnrs.fr/lirmm-00726361$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Pardi, Fabio</creatorcontrib><creatorcontrib>Gascuel, Olivier</creatorcontrib><title>Combinatorics of distance-based tree inference</title><title>Proceedings of the National Academy of Sciences - PNAS</title><addtitle>Proc Natl Acad Sci U S A</addtitle><description>Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible to the input distances. If we hold the structure (topology) of the tree fixed, in some relevant cases (e.g., ordinary least squares) the optimal values for the branch lengths can be expressed using simple combinatorial formulae. Here we define a general form for these formulae and show that they all have two desirable properties: First, the common tree reconstruction approaches (least squares, minimum evolution), when used in combination with these formulae, are guaranteed to infer the correct tree when given enough data (consistency); second, the branch lengths of all the simple (nearest neighbor interchange) rearrangements of a tree can be calculated, optimally, in quadratic time in the size of the tree, thus allowing the efficient application of hill climbing heuristics. The study presented here is a continuation of that by Mihaescu and Pachter on branch length estimation [Mihaescu R, Pachter L (2008) Proc Natl Acad Sei USA 105:13206-13211]. The focus here is on the inference of the tree itself and on providing a basis for novel algorithms to reconstruct trees from distances.</description><subject>Algorithms</subject><subject>Animals</subject><subject>Bioinformatics</subject><subject>Biological Sciences</subject><subject>Branches</subject><subject>Cluster Analysis</subject><subject>Combinatorics</subject><subject>Computational Biology - methods</subject><subject>Computer Science</subject><subject>Computer Simulation</subject><subject>Estimation methods</subject><subject>Evolution, Molecular</subject><subject>Gene expression</subject><subject>Genes</subject><subject>Heuristic</subject><subject>Humans</subject><subject>Inference</subject><subject>Least squares</subject><subject>Leaves</subject><subject>Life Sciences</subject><subject>Models, Genetic</subject><subject>Molecular evolution</subject><subject>Phylogenetics</subject><subject>Phylogeny</subject><subject>Physical Sciences</subject><subject>Quantitative Methods</subject><subject>Reproducibility of Results</subject><subject>Statistical discrepancies</subject><subject>Taxa</subject><subject>Topology</subject><subject>Trees</subject><issn>0027-8424</issn><issn>1091-6490</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkc9rFDEUx4Modq2ePSkDXgSZ7cuPSTIXoSxqhQUvvYdM8sZmmZmsyWyh_70Zd91qLwnkfd4nyfsS8pbCmoLiV_vJ5jWlVHOpKbTPyKqstJaihedkBcBUrQUTF-RVzjsAaBsNL8kF40CZAL4i600cuzDZOabgchX7yoc828lh3dmMvpoTYhWmHhOWw9fkRW-HjG9O-yW5_frldnNTb398-7653tauATHXAnwPrmt68GiFbrwHqjSzjnetZ33nO2BNg420DacahBLonROInZTAkF-Sz0ft_tCNpYTTnOxg9imMNj2YaIP5vzKFO_Mz3hsulJZMFcGno-DuSdvN9dYMIY2jAVBMcknvWaE_nq5L8dcB82zGkB0Og50wHrKhy9w4MLGIPzxBd_GQpjKLP5SUulWL8OpIuRRzTtif30DBLMGZJTjzGFzpeP_vj8_836QKUJ2ApfNR1xpBDZVCLMi7I7LLJc4zI6iSnEvFfwPTOadX</recordid><startdate>20121009</startdate><enddate>20121009</enddate><creator>Pardi, Fabio</creator><creator>Gascuel, Olivier</creator><general>National Academy of Sciences</general><general>National Acad Sciences</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QG</scope><scope>7QL</scope><scope>7QP</scope><scope>7QR</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TK</scope><scope>7TM</scope><scope>7TO</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>1XC</scope><scope>VOOES</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-8084-1464</orcidid><orcidid>https://orcid.org/0000-0002-9412-9723</orcidid></search><sort><creationdate>20121009</creationdate><title>Combinatorics of distance-based tree inference</title><author>Pardi, Fabio ; Gascuel, Olivier</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c504t-40df0cb5f0dea485dd01782ac3b9d2fbdb0255e56a53180474edcc4eeb6602e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Animals</topic><topic>Bioinformatics</topic><topic>Biological Sciences</topic><topic>Branches</topic><topic>Cluster Analysis</topic><topic>Combinatorics</topic><topic>Computational Biology - methods</topic><topic>Computer Science</topic><topic>Computer Simulation</topic><topic>Estimation methods</topic><topic>Evolution, Molecular</topic><topic>Gene expression</topic><topic>Genes</topic><topic>Heuristic</topic><topic>Humans</topic><topic>Inference</topic><topic>Least squares</topic><topic>Leaves</topic><topic>Life Sciences</topic><topic>Models, Genetic</topic><topic>Molecular evolution</topic><topic>Phylogenetics</topic><topic>Phylogeny</topic><topic>Physical Sciences</topic><topic>Quantitative Methods</topic><topic>Reproducibility of Results</topic><topic>Statistical discrepancies</topic><topic>Taxa</topic><topic>Topology</topic><topic>Trees</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pardi, Fabio</creatorcontrib><creatorcontrib>Gascuel, Olivier</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pardi, Fabio</au><au>Gascuel, Olivier</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Combinatorics of distance-based tree inference</atitle><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle><addtitle>Proc Natl Acad Sci U S A</addtitle><date>2012-10-09</date><risdate>2012</risdate><volume>109</volume><issue>41</issue><spage>16443</spage><epage>16448</epage><pages>16443-16448</pages><issn>0027-8424</issn><eissn>1091-6490</eissn><abstract>Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible to the input distances. If we hold the structure (topology) of the tree fixed, in some relevant cases (e.g., ordinary least squares) the optimal values for the branch lengths can be expressed using simple combinatorial formulae. Here we define a general form for these formulae and show that they all have two desirable properties: First, the common tree reconstruction approaches (least squares, minimum evolution), when used in combination with these formulae, are guaranteed to infer the correct tree when given enough data (consistency); second, the branch lengths of all the simple (nearest neighbor interchange) rearrangements of a tree can be calculated, optimally, in quadratic time in the size of the tree, thus allowing the efficient application of hill climbing heuristics. The study presented here is a continuation of that by Mihaescu and Pachter on branch length estimation [Mihaescu R, Pachter L (2008) Proc Natl Acad Sei USA 105:13206-13211]. The focus here is on the inference of the tree itself and on providing a basis for novel algorithms to reconstruct trees from distances.</abstract><cop>United States</cop><pub>National Academy of Sciences</pub><pmid>23012403</pmid><doi>10.1073/pnas.1118368109</doi><tpages>6</tpages><orcidid>https://orcid.org/0000-0001-8084-1464</orcidid><orcidid>https://orcid.org/0000-0002-9412-9723</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0027-8424 |
ispartof | Proceedings of the National Academy of Sciences - PNAS, 2012-10, Vol.109 (41), p.16443-16448 |
issn | 0027-8424 1091-6490 |
language | eng |
recordid | cdi_proquest_journals_1095668972 |
source | MEDLINE; JSTOR Archive Collection A-Z Listing; PubMed Central; Alma/SFX Local Collection; Free Full-Text Journals in Chemistry |
subjects | Algorithms Animals Bioinformatics Biological Sciences Branches Cluster Analysis Combinatorics Computational Biology - methods Computer Science Computer Simulation Estimation methods Evolution, Molecular Gene expression Genes Heuristic Humans Inference Least squares Leaves Life Sciences Models, Genetic Molecular evolution Phylogenetics Phylogeny Physical Sciences Quantitative Methods Reproducibility of Results Statistical discrepancies Taxa Topology Trees |
title | Combinatorics of distance-based tree inference |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T00%3A33%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Combinatorics%20of%20distance-based%20tree%20inference&rft.jtitle=Proceedings%20of%20the%20National%20Academy%20of%20Sciences%20-%20PNAS&rft.au=Pardi,%20Fabio&rft.date=2012-10-09&rft.volume=109&rft.issue=41&rft.spage=16443&rft.epage=16448&rft.pages=16443-16448&rft.issn=0027-8424&rft.eissn=1091-6490&rft_id=info:doi/10.1073/pnas.1118368109&rft_dat=%3Cjstor_proqu%3E41763367%3C/jstor_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1095668972&rft_id=info:pmid/23012403&rft_jstor_id=41763367&rfr_iscdi=true |