Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison

Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have dev...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PloS one 2013-03, Vol.8 (3), p.e56859-e56859
Hauptverfasser: Matsen, 4th, Frederick A, Evans, Steven N
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e56859
container_issue 3
container_start_page e56859
container_title PloS one
container_volume 8
creator Matsen, 4th, Frederick A
Evans, Steven N
description Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate "average" of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome.
doi_str_mv 10.1371/journal.pone.0056859
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_1330895006</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A478325646</galeid><doaj_id>oai_doaj_org_article_457c6c9bdf204a8daf222f61ebab687a</doaj_id><sourcerecordid>A478325646</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-d619fbec682c2a66312032da12d7d36a657fbfc36a3ef5e4f7bc86339061fef13</originalsourceid><addsrcrecordid>eNqNk09v1DAQxSMEoqXwDRBYQkJw2CW2YyfpAamqClSqVIl_V2vijLNZOXFqO4ge-eZ42221i3pAOdhyfu9N5sWTZS9pvqS8pB_WbvYj2OXkRlzmuZCVqB9lh7TmbCFZzh_v7A-yZyGsE8QrKZ9mB4yLXBRUHGZ_ztoOyeT7UfcTWKLdsDEcYyAwtiRczRBWRNs5RExQd0zmkBYSV0jChLpPmhD9rOPskThDptW1dR2OGHtNJgsah-RGWohAjPMkwDBZvKkDvg9ufJ49MWADvtiuR9mPT2ffT78sLi4_n5-eXCy0rFlctJLWpkEtK6YZSMlp6ou1QFlbtlyCFKVpjE47jkZgYcpGV5LzOpfUoKH8KHt96ztZF9Q2vaAo53lVizyXiTi_JVoHa5UyGcBfKwe9ujlwvlPgU1sWVSFKLXXdtIblBVQtGMaYkRQbaGRVQvL6uK02NwO2OmXgwe6Z7r8Z-5Xq3C_FRV2wukwG77YG3l3NGKIa-qDRWhjRzZvvpmUleCFFQt_8gz7c3ZbqIDXQj8alunpjqk6KsuJMyGJDLR-g0tPi0Ot0M0yfzvcE7_cEiYn4O3Ywh6DOv339f_by5z77doddIdi4Cs7OsXdj2AeLW1B7F4JHcx8yzdVmUu7SUJuLrbaTkmSvdn_QvehuNPhfiTkRYQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1330895006</pqid></control><display><type>article</type><title>Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison</title><source>Public Library of Science (PLoS) Journals Open Access</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Free E-Journal (出版社公開部分のみ)</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Matsen, 4th, Frederick A ; Evans, Steven N</creator><contributor>Moustafa, Ahmed</contributor><creatorcontrib>Matsen, 4th, Frederick A ; Evans, Steven N ; Moustafa, Ahmed</creatorcontrib><description>Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate "average" of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0056859</identifier><identifier>PMID: 23505415</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Algorithms ; Bioinformatics ; Biology ; Cluster Analysis ; Clustering ; Coloring ; Computer Science ; Environmental Microbiology ; Female ; Genomes ; Humans ; Leaves ; Mathematics ; Metagenome - genetics ; Methods ; Microbial activity ; Microbiota ; Microorganisms ; Phylogenetics ; Phylogeny ; Principal Component Analysis ; Principal components analysis ; Taxa ; Taxonomy ; Thickening ; Vagina - microbiology</subject><ispartof>PloS one, 2013-03, Vol.8 (3), p.e56859-e56859</ispartof><rights>COPYRIGHT 2013 Public Library of Science</rights><rights>2013 Matsen, Evans. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2013 Matsen, Evans 2013 Matsen, Evans</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-d619fbec682c2a66312032da12d7d36a657fbfc36a3ef5e4f7bc86339061fef13</citedby><cites>FETCH-LOGICAL-c692t-d619fbec682c2a66312032da12d7d36a657fbfc36a3ef5e4f7bc86339061fef13</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3594297/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3594297/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79343,79344</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23505415$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Moustafa, Ahmed</contributor><creatorcontrib>Matsen, 4th, Frederick A</creatorcontrib><creatorcontrib>Evans, Steven N</creatorcontrib><title>Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate "average" of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome.</description><subject>Algorithms</subject><subject>Bioinformatics</subject><subject>Biology</subject><subject>Cluster Analysis</subject><subject>Clustering</subject><subject>Coloring</subject><subject>Computer Science</subject><subject>Environmental Microbiology</subject><subject>Female</subject><subject>Genomes</subject><subject>Humans</subject><subject>Leaves</subject><subject>Mathematics</subject><subject>Metagenome - genetics</subject><subject>Methods</subject><subject>Microbial activity</subject><subject>Microbiota</subject><subject>Microorganisms</subject><subject>Phylogenetics</subject><subject>Phylogeny</subject><subject>Principal Component Analysis</subject><subject>Principal components analysis</subject><subject>Taxa</subject><subject>Taxonomy</subject><subject>Thickening</subject><subject>Vagina - microbiology</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><sourceid>DOA</sourceid><recordid>eNqNk09v1DAQxSMEoqXwDRBYQkJw2CW2YyfpAamqClSqVIl_V2vijLNZOXFqO4ge-eZ42221i3pAOdhyfu9N5sWTZS9pvqS8pB_WbvYj2OXkRlzmuZCVqB9lh7TmbCFZzh_v7A-yZyGsE8QrKZ9mB4yLXBRUHGZ_ztoOyeT7UfcTWKLdsDEcYyAwtiRczRBWRNs5RExQd0zmkBYSV0jChLpPmhD9rOPskThDptW1dR2OGHtNJgsah-RGWohAjPMkwDBZvKkDvg9ufJ49MWADvtiuR9mPT2ffT78sLi4_n5-eXCy0rFlctJLWpkEtK6YZSMlp6ou1QFlbtlyCFKVpjE47jkZgYcpGV5LzOpfUoKH8KHt96ztZF9Q2vaAo53lVizyXiTi_JVoHa5UyGcBfKwe9ujlwvlPgU1sWVSFKLXXdtIblBVQtGMaYkRQbaGRVQvL6uK02NwO2OmXgwe6Z7r8Z-5Xq3C_FRV2wukwG77YG3l3NGKIa-qDRWhjRzZvvpmUleCFFQt_8gz7c3ZbqIDXQj8alunpjqk6KsuJMyGJDLR-g0tPi0Ot0M0yfzvcE7_cEiYn4O3Ywh6DOv339f_by5z77doddIdi4Cs7OsXdj2AeLW1B7F4JHcx8yzdVmUu7SUJuLrbaTkmSvdn_QvehuNPhfiTkRYQ</recordid><startdate>20130311</startdate><enddate>20130311</enddate><creator>Matsen, 4th, Frederick A</creator><creator>Evans, Steven N</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20130311</creationdate><title>Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison</title><author>Matsen, 4th, Frederick A ; Evans, Steven N</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-d619fbec682c2a66312032da12d7d36a657fbfc36a3ef5e4f7bc86339061fef13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Bioinformatics</topic><topic>Biology</topic><topic>Cluster Analysis</topic><topic>Clustering</topic><topic>Coloring</topic><topic>Computer Science</topic><topic>Environmental Microbiology</topic><topic>Female</topic><topic>Genomes</topic><topic>Humans</topic><topic>Leaves</topic><topic>Mathematics</topic><topic>Metagenome - genetics</topic><topic>Methods</topic><topic>Microbial activity</topic><topic>Microbiota</topic><topic>Microorganisms</topic><topic>Phylogenetics</topic><topic>Phylogeny</topic><topic>Principal Component Analysis</topic><topic>Principal components analysis</topic><topic>Taxa</topic><topic>Taxonomy</topic><topic>Thickening</topic><topic>Vagina - microbiology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Matsen, 4th, Frederick A</creatorcontrib><creatorcontrib>Evans, Steven N</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>ProQuest Nursing &amp; Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>ProQuest Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>ProQuest Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>https://resources.nclive.org/materials</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>Biological Sciences</collection><collection>Agriculture Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>ProQuest Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Matsen, 4th, Frederick A</au><au>Evans, Steven N</au><au>Moustafa, Ahmed</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2013-03-11</date><risdate>2013</risdate><volume>8</volume><issue>3</issue><spage>e56859</spage><epage>e56859</epage><pages>e56859-e56859</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate "average" of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>23505415</pmid><doi>10.1371/journal.pone.0056859</doi><tpages>e56859</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2013-03, Vol.8 (3), p.e56859-e56859
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_1330895006
source Public Library of Science (PLoS) Journals Open Access; MEDLINE; DOAJ Directory of Open Access Journals; Free E-Journal (出版社公開部分のみ); PubMed Central; Free Full-Text Journals in Chemistry
subjects Algorithms
Bioinformatics
Biology
Cluster Analysis
Clustering
Coloring
Computer Science
Environmental Microbiology
Female
Genomes
Humans
Leaves
Mathematics
Metagenome - genetics
Methods
Microbial activity
Microbiota
Microorganisms
Phylogenetics
Phylogeny
Principal Component Analysis
Principal components analysis
Taxa
Taxonomy
Thickening
Vagina - microbiology
title Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T07%3A00%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Edge%20principal%20components%20and%20squash%20clustering:%20using%20the%20special%20structure%20of%20phylogenetic%20placement%20data%20for%20sample%20comparison&rft.jtitle=PloS%20one&rft.au=Matsen,%204th,%20Frederick%20A&rft.date=2013-03-11&rft.volume=8&rft.issue=3&rft.spage=e56859&rft.epage=e56859&rft.pages=e56859-e56859&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0056859&rft_dat=%3Cgale_plos_%3EA478325646%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1330895006&rft_id=info:pmid/23505415&rft_galeid=A478325646&rft_doaj_id=oai_doaj_org_article_457c6c9bdf204a8daf222f61ebab687a&rfr_iscdi=true