Combinatorics of distance-based tree inference

Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the National Academy of Sciences - PNAS 2012-10, Vol.109 (41), p.16443-16448
Hauptverfasser: Pardi, Fabio, Gascuel, Olivier
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 16448
container_issue 41
container_start_page 16443
container_title Proceedings of the National Academy of Sciences - PNAS
container_volume 109
creator Pardi, Fabio
Gascuel, Olivier
description Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible to the input distances. If we hold the structure (topology) of the tree fixed, in some relevant cases (e.g., ordinary least squares) the optimal values for the branch lengths can be expressed using simple combinatorial formulae. Here we define a general form for these formulae and show that they all have two desirable properties: First, the common tree reconstruction approaches (least squares, minimum evolution), when used in combination with these formulae, are guaranteed to infer the correct tree when given enough data (consistency); second, the branch lengths of all the simple (nearest neighbor interchange) rearrangements of a tree can be calculated, optimally, in quadratic time in the size of the tree, thus allowing the efficient application of hill climbing heuristics. The study presented here is a continuation of that by Mihaescu and Pachter on branch length estimation [Mihaescu R, Pachter L (2008) Proc Natl Acad Sei USA 105:13206-13211]. The focus here is on the inference of the tree itself and on providing a basis for novel algorithms to reconstruct trees from distances.
doi_str_mv 10.1073/pnas.1118368109
format Article
fullrecord <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_journals_1095668972</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>41763367</jstor_id><sourcerecordid>41763367</sourcerecordid><originalsourceid>FETCH-LOGICAL-c504t-40df0cb5f0dea485dd01782ac3b9d2fbdb0255e56a53180474edcc4eeb6602e3</originalsourceid><addsrcrecordid>eNpdkc9rFDEUx4Modq2ePSkDXgSZ7cuPSTIXoSxqhQUvvYdM8sZmmZmsyWyh_70Zd91qLwnkfd4nyfsS8pbCmoLiV_vJ5jWlVHOpKbTPyKqstJaihedkBcBUrQUTF-RVzjsAaBsNL8kF40CZAL4i600cuzDZOabgchX7yoc828lh3dmMvpoTYhWmHhOWw9fkRW-HjG9O-yW5_frldnNTb398-7653tauATHXAnwPrmt68GiFbrwHqjSzjnetZ33nO2BNg420DacahBLonROInZTAkF-Sz0ft_tCNpYTTnOxg9imMNj2YaIP5vzKFO_Mz3hsulJZMFcGno-DuSdvN9dYMIY2jAVBMcknvWaE_nq5L8dcB82zGkB0Og50wHrKhy9w4MLGIPzxBd_GQpjKLP5SUulWL8OpIuRRzTtif30DBLMGZJTjzGFzpeP_vj8_836QKUJ2ApfNR1xpBDZVCLMi7I7LLJc4zI6iSnEvFfwPTOadX</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1095668972</pqid></control><display><type>article</type><title>Combinatorics of distance-based tree inference</title><source>MEDLINE</source><source>JSTOR Archive Collection A-Z Listing</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Pardi, Fabio ; Gascuel, Olivier</creator><creatorcontrib>Pardi, Fabio ; Gascuel, Olivier</creatorcontrib><description>Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible to the input distances. If we hold the structure (topology) of the tree fixed, in some relevant cases (e.g., ordinary least squares) the optimal values for the branch lengths can be expressed using simple combinatorial formulae. Here we define a general form for these formulae and show that they all have two desirable properties: First, the common tree reconstruction approaches (least squares, minimum evolution), when used in combination with these formulae, are guaranteed to infer the correct tree when given enough data (consistency); second, the branch lengths of all the simple (nearest neighbor interchange) rearrangements of a tree can be calculated, optimally, in quadratic time in the size of the tree, thus allowing the efficient application of hill climbing heuristics. The study presented here is a continuation of that by Mihaescu and Pachter on branch length estimation [Mihaescu R, Pachter L (2008) Proc Natl Acad Sei USA 105:13206-13211]. The focus here is on the inference of the tree itself and on providing a basis for novel algorithms to reconstruct trees from distances.</description><identifier>ISSN: 0027-8424</identifier><identifier>EISSN: 1091-6490</identifier><identifier>DOI: 10.1073/pnas.1118368109</identifier><identifier>PMID: 23012403</identifier><language>eng</language><publisher>United States: National Academy of Sciences</publisher><subject>Algorithms ; Animals ; Bioinformatics ; Biological Sciences ; Branches ; Cluster Analysis ; Combinatorics ; Computational Biology - methods ; Computer Science ; Computer Simulation ; Estimation methods ; Evolution, Molecular ; Gene expression ; Genes ; Heuristic ; Humans ; Inference ; Least squares ; Leaves ; Life Sciences ; Models, Genetic ; Molecular evolution ; Phylogenetics ; Phylogeny ; Physical Sciences ; Quantitative Methods ; Reproducibility of Results ; Statistical discrepancies ; Taxa ; Topology ; Trees</subject><ispartof>Proceedings of the National Academy of Sciences - PNAS, 2012-10, Vol.109 (41), p.16443-16448</ispartof><rights>copyright © 1993-2008 National Academy of Sciences of the United States of America</rights><rights>Copyright National Academy of Sciences Oct 9, 2012</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c504t-40df0cb5f0dea485dd01782ac3b9d2fbdb0255e56a53180474edcc4eeb6602e3</citedby><cites>FETCH-LOGICAL-c504t-40df0cb5f0dea485dd01782ac3b9d2fbdb0255e56a53180474edcc4eeb6602e3</cites><orcidid>0000-0001-8084-1464 ; 0000-0002-9412-9723</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://www.pnas.org/content/109/41.cover.gif</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/41763367$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/41763367$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,727,780,784,803,885,27922,27923,53789,53791,58015,58248</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23012403$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://hal-lirmm.ccsd.cnrs.fr/lirmm-00726361$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Pardi, Fabio</creatorcontrib><creatorcontrib>Gascuel, Olivier</creatorcontrib><title>Combinatorics of distance-based tree inference</title><title>Proceedings of the National Academy of Sciences - PNAS</title><addtitle>Proc Natl Acad Sci U S A</addtitle><description>Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible to the input distances. If we hold the structure (topology) of the tree fixed, in some relevant cases (e.g., ordinary least squares) the optimal values for the branch lengths can be expressed using simple combinatorial formulae. Here we define a general form for these formulae and show that they all have two desirable properties: First, the common tree reconstruction approaches (least squares, minimum evolution), when used in combination with these formulae, are guaranteed to infer the correct tree when given enough data (consistency); second, the branch lengths of all the simple (nearest neighbor interchange) rearrangements of a tree can be calculated, optimally, in quadratic time in the size of the tree, thus allowing the efficient application of hill climbing heuristics. The study presented here is a continuation of that by Mihaescu and Pachter on branch length estimation [Mihaescu R, Pachter L (2008) Proc Natl Acad Sei USA 105:13206-13211]. The focus here is on the inference of the tree itself and on providing a basis for novel algorithms to reconstruct trees from distances.</description><subject>Algorithms</subject><subject>Animals</subject><subject>Bioinformatics</subject><subject>Biological Sciences</subject><subject>Branches</subject><subject>Cluster Analysis</subject><subject>Combinatorics</subject><subject>Computational Biology - methods</subject><subject>Computer Science</subject><subject>Computer Simulation</subject><subject>Estimation methods</subject><subject>Evolution, Molecular</subject><subject>Gene expression</subject><subject>Genes</subject><subject>Heuristic</subject><subject>Humans</subject><subject>Inference</subject><subject>Least squares</subject><subject>Leaves</subject><subject>Life Sciences</subject><subject>Models, Genetic</subject><subject>Molecular evolution</subject><subject>Phylogenetics</subject><subject>Phylogeny</subject><subject>Physical Sciences</subject><subject>Quantitative Methods</subject><subject>Reproducibility of Results</subject><subject>Statistical discrepancies</subject><subject>Taxa</subject><subject>Topology</subject><subject>Trees</subject><issn>0027-8424</issn><issn>1091-6490</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkc9rFDEUx4Modq2ePSkDXgSZ7cuPSTIXoSxqhQUvvYdM8sZmmZmsyWyh_70Zd91qLwnkfd4nyfsS8pbCmoLiV_vJ5jWlVHOpKbTPyKqstJaihedkBcBUrQUTF-RVzjsAaBsNL8kF40CZAL4i600cuzDZOabgchX7yoc828lh3dmMvpoTYhWmHhOWw9fkRW-HjG9O-yW5_frldnNTb398-7653tauATHXAnwPrmt68GiFbrwHqjSzjnetZ33nO2BNg420DacahBLonROInZTAkF-Sz0ft_tCNpYTTnOxg9imMNj2YaIP5vzKFO_Mz3hsulJZMFcGno-DuSdvN9dYMIY2jAVBMcknvWaE_nq5L8dcB82zGkB0Og50wHrKhy9w4MLGIPzxBd_GQpjKLP5SUulWL8OpIuRRzTtif30DBLMGZJTjzGFzpeP_vj8_836QKUJ2ApfNR1xpBDZVCLMi7I7LLJc4zI6iSnEvFfwPTOadX</recordid><startdate>20121009</startdate><enddate>20121009</enddate><creator>Pardi, Fabio</creator><creator>Gascuel, Olivier</creator><general>National Academy of Sciences</general><general>National Acad Sciences</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QG</scope><scope>7QL</scope><scope>7QP</scope><scope>7QR</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TK</scope><scope>7TM</scope><scope>7TO</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>1XC</scope><scope>VOOES</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-8084-1464</orcidid><orcidid>https://orcid.org/0000-0002-9412-9723</orcidid></search><sort><creationdate>20121009</creationdate><title>Combinatorics of distance-based tree inference</title><author>Pardi, Fabio ; Gascuel, Olivier</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c504t-40df0cb5f0dea485dd01782ac3b9d2fbdb0255e56a53180474edcc4eeb6602e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Animals</topic><topic>Bioinformatics</topic><topic>Biological Sciences</topic><topic>Branches</topic><topic>Cluster Analysis</topic><topic>Combinatorics</topic><topic>Computational Biology - methods</topic><topic>Computer Science</topic><topic>Computer Simulation</topic><topic>Estimation methods</topic><topic>Evolution, Molecular</topic><topic>Gene expression</topic><topic>Genes</topic><topic>Heuristic</topic><topic>Humans</topic><topic>Inference</topic><topic>Least squares</topic><topic>Leaves</topic><topic>Life Sciences</topic><topic>Models, Genetic</topic><topic>Molecular evolution</topic><topic>Phylogenetics</topic><topic>Phylogeny</topic><topic>Physical Sciences</topic><topic>Quantitative Methods</topic><topic>Reproducibility of Results</topic><topic>Statistical discrepancies</topic><topic>Taxa</topic><topic>Topology</topic><topic>Trees</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pardi, Fabio</creatorcontrib><creatorcontrib>Gascuel, Olivier</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pardi, Fabio</au><au>Gascuel, Olivier</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Combinatorics of distance-based tree inference</atitle><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle><addtitle>Proc Natl Acad Sci U S A</addtitle><date>2012-10-09</date><risdate>2012</risdate><volume>109</volume><issue>41</issue><spage>16443</spage><epage>16448</epage><pages>16443-16448</pages><issn>0027-8424</issn><eissn>1091-6490</eissn><abstract>Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible to the input distances. If we hold the structure (topology) of the tree fixed, in some relevant cases (e.g., ordinary least squares) the optimal values for the branch lengths can be expressed using simple combinatorial formulae. Here we define a general form for these formulae and show that they all have two desirable properties: First, the common tree reconstruction approaches (least squares, minimum evolution), when used in combination with these formulae, are guaranteed to infer the correct tree when given enough data (consistency); second, the branch lengths of all the simple (nearest neighbor interchange) rearrangements of a tree can be calculated, optimally, in quadratic time in the size of the tree, thus allowing the efficient application of hill climbing heuristics. The study presented here is a continuation of that by Mihaescu and Pachter on branch length estimation [Mihaescu R, Pachter L (2008) Proc Natl Acad Sei USA 105:13206-13211]. The focus here is on the inference of the tree itself and on providing a basis for novel algorithms to reconstruct trees from distances.</abstract><cop>United States</cop><pub>National Academy of Sciences</pub><pmid>23012403</pmid><doi>10.1073/pnas.1118368109</doi><tpages>6</tpages><orcidid>https://orcid.org/0000-0001-8084-1464</orcidid><orcidid>https://orcid.org/0000-0002-9412-9723</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0027-8424
ispartof Proceedings of the National Academy of Sciences - PNAS, 2012-10, Vol.109 (41), p.16443-16448
issn 0027-8424
1091-6490
language eng
recordid cdi_proquest_journals_1095668972
source MEDLINE; JSTOR Archive Collection A-Z Listing; PubMed Central; Alma/SFX Local Collection; Free Full-Text Journals in Chemistry
subjects Algorithms
Animals
Bioinformatics
Biological Sciences
Branches
Cluster Analysis
Combinatorics
Computational Biology - methods
Computer Science
Computer Simulation
Estimation methods
Evolution, Molecular
Gene expression
Genes
Heuristic
Humans
Inference
Least squares
Leaves
Life Sciences
Models, Genetic
Molecular evolution
Phylogenetics
Phylogeny
Physical Sciences
Quantitative Methods
Reproducibility of Results
Statistical discrepancies
Taxa
Topology
Trees
title Combinatorics of distance-based tree inference
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T00%3A33%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Combinatorics%20of%20distance-based%20tree%20inference&rft.jtitle=Proceedings%20of%20the%20National%20Academy%20of%20Sciences%20-%20PNAS&rft.au=Pardi,%20Fabio&rft.date=2012-10-09&rft.volume=109&rft.issue=41&rft.spage=16443&rft.epage=16448&rft.pages=16443-16448&rft.issn=0027-8424&rft.eissn=1091-6490&rft_id=info:doi/10.1073/pnas.1118368109&rft_dat=%3Cjstor_proqu%3E41763367%3C/jstor_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1095668972&rft_id=info:pmid/23012403&rft_jstor_id=41763367&rfr_iscdi=true