Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees

Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Biometrika 2017-12, Vol.104 (4), p.901-922
Hauptverfasser: NYE, TOM M. W., TANG, XIAOXIAN, WEYENBERG, GRADY, YOSHIDA, RURIKO
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 922
container_issue 4
container_start_page 901
container_title Biometrika
container_volume 104
creator NYE, TOM M. W.
TANG, XIAOXIAN
WEYENBERG, GRADY
YOSHIDA, RURIKO
description Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample’s structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the kth principal component in Euclidean space: the locus of the weighted Fréchet mean of k + 1 vertex trees when the weights vary over the k-simplex. We establish some basic properties of these objects, in particular showing that they have dimension k, and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.
doi_str_mv 10.1093/biomet/asx047
format Article
fullrecord <record><control><sourceid>jstor_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5793493</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>48546272</jstor_id><sourcerecordid>48546272</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3207-be15ce20f71662023693dc90a6f55a0d2e97d5fd7f72f696dacf63ff22e5af283</originalsourceid><addsrcrecordid>eNpVkclOwzAQhi0EomU5cgTlyCXU8Zb6goQQm4QEBzhbrjOmrhI72Cmij8Rz8GKkBCo4zfbpnxn9CB0V-KzAkk5mLjTQTXR6x6zcQuOCCZZTXuBtNMYYi5wyxkZoL6XFuhRc7KIRkYwQIdkYzR6j88a1us5MaNrgwXeZ9rpeJZf6pMq6OWR1MMuUBftdXMfPDzOHLmtA-8z572ZqtYE10c5XdXgBD50zWRcB0gHasbpOcPgT99Hz9dXT5W1-_3Bzd3lxnxtKcJnPoOAGCLZlIQTBhApJKyOxFpZzjSsCsqy4rUpbEiukqLSxglpLCHBtyZTuo_NBt13OGqhM_0nUtWqja3RcqaCd-j_xbq5ewpvipaRM0l7g9EcghtclpE41Lhmoa-0hLJMiGBdYUC5Jj-YDamJIKYLdrCmwWvuiBl_U4EvPn_y9bUP_GtEDxwOwSF2ImzmbciZISegXx_yX2g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2001063592</pqid></control><display><type>article</type><title>Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees</title><source>JSTOR Mathematics &amp; Statistics</source><source>JSTOR Archive Collection A-Z Listing</source><source>Oxford University Press Journals All Titles (1996-Current)</source><creator>NYE, TOM M. W. ; TANG, XIAOXIAN ; WEYENBERG, GRADY ; YOSHIDA, RURIKO</creator><creatorcontrib>NYE, TOM M. W. ; TANG, XIAOXIAN ; WEYENBERG, GRADY ; YOSHIDA, RURIKO</creatorcontrib><description>Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample’s structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the kth principal component in Euclidean space: the locus of the weighted Fréchet mean of k + 1 vertex trees when the weights vary over the k-simplex. We establish some basic properties of these objects, in particular showing that they have dimension k, and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.</description><identifier>ISSN: 0006-3444</identifier><identifier>EISSN: 1464-3510</identifier><identifier>DOI: 10.1093/biomet/asx047</identifier><identifier>PMID: 29422694</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><ispartof>Biometrika, 2017-12, Vol.104 (4), p.901-922</ispartof><rights>2017 Biometrika Trust</rights><rights>2017 Biometrika Trust 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3207-be15ce20f71662023693dc90a6f55a0d2e97d5fd7f72f696dacf63ff22e5af283</citedby><cites>FETCH-LOGICAL-c3207-be15ce20f71662023693dc90a6f55a0d2e97d5fd7f72f696dacf63ff22e5af283</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/48546272$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/48546272$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,780,784,803,832,885,27924,27925,58017,58021,58250,58254</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29422694$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>NYE, TOM M. W.</creatorcontrib><creatorcontrib>TANG, XIAOXIAN</creatorcontrib><creatorcontrib>WEYENBERG, GRADY</creatorcontrib><creatorcontrib>YOSHIDA, RURIKO</creatorcontrib><title>Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees</title><title>Biometrika</title><addtitle>Biometrika</addtitle><description>Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample’s structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the kth principal component in Euclidean space: the locus of the weighted Fréchet mean of k + 1 vertex trees when the weights vary over the k-simplex. We establish some basic properties of these objects, in particular showing that they have dimension k, and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.</description><issn>0006-3444</issn><issn>1464-3510</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNpVkclOwzAQhi0EomU5cgTlyCXU8Zb6goQQm4QEBzhbrjOmrhI72Cmij8Rz8GKkBCo4zfbpnxn9CB0V-KzAkk5mLjTQTXR6x6zcQuOCCZZTXuBtNMYYi5wyxkZoL6XFuhRc7KIRkYwQIdkYzR6j88a1us5MaNrgwXeZ9rpeJZf6pMq6OWR1MMuUBftdXMfPDzOHLmtA-8z572ZqtYE10c5XdXgBD50zWRcB0gHasbpOcPgT99Hz9dXT5W1-_3Bzd3lxnxtKcJnPoOAGCLZlIQTBhApJKyOxFpZzjSsCsqy4rUpbEiukqLSxglpLCHBtyZTuo_NBt13OGqhM_0nUtWqja3RcqaCd-j_xbq5ewpvipaRM0l7g9EcghtclpE41Lhmoa-0hLJMiGBdYUC5Jj-YDamJIKYLdrCmwWvuiBl_U4EvPn_y9bUP_GtEDxwOwSF2ImzmbciZISegXx_yX2g</recordid><startdate>201712</startdate><enddate>201712</enddate><creator>NYE, TOM M. W.</creator><creator>TANG, XIAOXIAN</creator><creator>WEYENBERG, GRADY</creator><creator>YOSHIDA, RURIKO</creator><general>Oxford University Press</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>201712</creationdate><title>Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees</title><author>NYE, TOM M. W. ; TANG, XIAOXIAN ; WEYENBERG, GRADY ; YOSHIDA, RURIKO</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3207-be15ce20f71662023693dc90a6f55a0d2e97d5fd7f72f696dacf63ff22e5af283</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>NYE, TOM M. W.</creatorcontrib><creatorcontrib>TANG, XIAOXIAN</creatorcontrib><creatorcontrib>WEYENBERG, GRADY</creatorcontrib><creatorcontrib>YOSHIDA, RURIKO</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Biometrika</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>NYE, TOM M. W.</au><au>TANG, XIAOXIAN</au><au>WEYENBERG, GRADY</au><au>YOSHIDA, RURIKO</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees</atitle><jtitle>Biometrika</jtitle><addtitle>Biometrika</addtitle><date>2017-12</date><risdate>2017</risdate><volume>104</volume><issue>4</issue><spage>901</spage><epage>922</epage><pages>901-922</pages><issn>0006-3444</issn><eissn>1464-3510</eissn><abstract>Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample’s structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the kth principal component in Euclidean space: the locus of the weighted Fréchet mean of k + 1 vertex trees when the weights vary over the k-simplex. We establish some basic properties of these objects, in particular showing that they have dimension k, and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>29422694</pmid><doi>10.1093/biomet/asx047</doi><tpages>22</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0006-3444
ispartof Biometrika, 2017-12, Vol.104 (4), p.901-922
issn 0006-3444
1464-3510
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5793493
source JSTOR Mathematics & Statistics; JSTOR Archive Collection A-Z Listing; Oxford University Press Journals All Titles (1996-Current)
title Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T15%3A59%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Principal%20component%20analysis%20and%20the%20locus%20of%20the%20Fr%C3%A9chet%20mean%20in%20the%20space%20of%20phylogenetic%20trees&rft.jtitle=Biometrika&rft.au=NYE,%20TOM%20M.%20W.&rft.date=2017-12&rft.volume=104&rft.issue=4&rft.spage=901&rft.epage=922&rft.pages=901-922&rft.issn=0006-3444&rft.eissn=1464-3510&rft_id=info:doi/10.1093/biomet/asx047&rft_dat=%3Cjstor_pubme%3E48546272%3C/jstor_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2001063592&rft_id=info:pmid/29422694&rft_jstor_id=48546272&rfr_iscdi=true