Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison
Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have dev...
Gespeichert in:
Veröffentlicht in: | PloS one 2013-03, Vol.8 (3), p.e56859-e56859 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e56859 |
---|---|
container_issue | 3 |
container_start_page | e56859 |
container_title | PloS one |
container_volume | 8 |
creator | Matsen, 4th, Frederick A Evans, Steven N |
description | Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate "average" of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome. |
doi_str_mv | 10.1371/journal.pone.0056859 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_1330895006</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A478325646</galeid><doaj_id>oai_doaj_org_article_457c6c9bdf204a8daf222f61ebab687a</doaj_id><sourcerecordid>A478325646</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-d619fbec682c2a66312032da12d7d36a657fbfc36a3ef5e4f7bc86339061fef13</originalsourceid><addsrcrecordid>eNqNk09v1DAQxSMEoqXwDRBYQkJw2CW2YyfpAamqClSqVIl_V2vijLNZOXFqO4ge-eZ42221i3pAOdhyfu9N5sWTZS9pvqS8pB_WbvYj2OXkRlzmuZCVqB9lh7TmbCFZzh_v7A-yZyGsE8QrKZ9mB4yLXBRUHGZ_ztoOyeT7UfcTWKLdsDEcYyAwtiRczRBWRNs5RExQd0zmkBYSV0jChLpPmhD9rOPskThDptW1dR2OGHtNJgsah-RGWohAjPMkwDBZvKkDvg9ufJ49MWADvtiuR9mPT2ffT78sLi4_n5-eXCy0rFlctJLWpkEtK6YZSMlp6ou1QFlbtlyCFKVpjE47jkZgYcpGV5LzOpfUoKH8KHt96ztZF9Q2vaAo53lVizyXiTi_JVoHa5UyGcBfKwe9ujlwvlPgU1sWVSFKLXXdtIblBVQtGMaYkRQbaGRVQvL6uK02NwO2OmXgwe6Z7r8Z-5Xq3C_FRV2wukwG77YG3l3NGKIa-qDRWhjRzZvvpmUleCFFQt_8gz7c3ZbqIDXQj8alunpjqk6KsuJMyGJDLR-g0tPi0Ot0M0yfzvcE7_cEiYn4O3Ywh6DOv339f_by5z77doddIdi4Cs7OsXdj2AeLW1B7F4JHcx8yzdVmUu7SUJuLrbaTkmSvdn_QvehuNPhfiTkRYQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1330895006</pqid></control><display><type>article</type><title>Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison</title><source>Public Library of Science (PLoS) Journals Open Access</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Free E-Journal (出版社公開部分のみ)</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Matsen, 4th, Frederick A ; Evans, Steven N</creator><contributor>Moustafa, Ahmed</contributor><creatorcontrib>Matsen, 4th, Frederick A ; Evans, Steven N ; Moustafa, Ahmed</creatorcontrib><description>Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate "average" of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0056859</identifier><identifier>PMID: 23505415</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Algorithms ; Bioinformatics ; Biology ; Cluster Analysis ; Clustering ; Coloring ; Computer Science ; Environmental Microbiology ; Female ; Genomes ; Humans ; Leaves ; Mathematics ; Metagenome - genetics ; Methods ; Microbial activity ; Microbiota ; Microorganisms ; Phylogenetics ; Phylogeny ; Principal Component Analysis ; Principal components analysis ; Taxa ; Taxonomy ; Thickening ; Vagina - microbiology</subject><ispartof>PloS one, 2013-03, Vol.8 (3), p.e56859-e56859</ispartof><rights>COPYRIGHT 2013 Public Library of Science</rights><rights>2013 Matsen, Evans. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2013 Matsen, Evans 2013 Matsen, Evans</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-d619fbec682c2a66312032da12d7d36a657fbfc36a3ef5e4f7bc86339061fef13</citedby><cites>FETCH-LOGICAL-c692t-d619fbec682c2a66312032da12d7d36a657fbfc36a3ef5e4f7bc86339061fef13</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3594297/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3594297/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79343,79344</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23505415$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Moustafa, Ahmed</contributor><creatorcontrib>Matsen, 4th, Frederick A</creatorcontrib><creatorcontrib>Evans, Steven N</creatorcontrib><title>Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate "average" of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome.</description><subject>Algorithms</subject><subject>Bioinformatics</subject><subject>Biology</subject><subject>Cluster Analysis</subject><subject>Clustering</subject><subject>Coloring</subject><subject>Computer Science</subject><subject>Environmental Microbiology</subject><subject>Female</subject><subject>Genomes</subject><subject>Humans</subject><subject>Leaves</subject><subject>Mathematics</subject><subject>Metagenome - genetics</subject><subject>Methods</subject><subject>Microbial activity</subject><subject>Microbiota</subject><subject>Microorganisms</subject><subject>Phylogenetics</subject><subject>Phylogeny</subject><subject>Principal Component Analysis</subject><subject>Principal components analysis</subject><subject>Taxa</subject><subject>Taxonomy</subject><subject>Thickening</subject><subject>Vagina - microbiology</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><sourceid>DOA</sourceid><recordid>eNqNk09v1DAQxSMEoqXwDRBYQkJw2CW2YyfpAamqClSqVIl_V2vijLNZOXFqO4ge-eZ42221i3pAOdhyfu9N5sWTZS9pvqS8pB_WbvYj2OXkRlzmuZCVqB9lh7TmbCFZzh_v7A-yZyGsE8QrKZ9mB4yLXBRUHGZ_ztoOyeT7UfcTWKLdsDEcYyAwtiRczRBWRNs5RExQd0zmkBYSV0jChLpPmhD9rOPskThDptW1dR2OGHtNJgsah-RGWohAjPMkwDBZvKkDvg9ufJ49MWADvtiuR9mPT2ffT78sLi4_n5-eXCy0rFlctJLWpkEtK6YZSMlp6ou1QFlbtlyCFKVpjE47jkZgYcpGV5LzOpfUoKH8KHt96ztZF9Q2vaAo53lVizyXiTi_JVoHa5UyGcBfKwe9ujlwvlPgU1sWVSFKLXXdtIblBVQtGMaYkRQbaGRVQvL6uK02NwO2OmXgwe6Z7r8Z-5Xq3C_FRV2wukwG77YG3l3NGKIa-qDRWhjRzZvvpmUleCFFQt_8gz7c3ZbqIDXQj8alunpjqk6KsuJMyGJDLR-g0tPi0Ot0M0yfzvcE7_cEiYn4O3Ywh6DOv339f_by5z77doddIdi4Cs7OsXdj2AeLW1B7F4JHcx8yzdVmUu7SUJuLrbaTkmSvdn_QvehuNPhfiTkRYQ</recordid><startdate>20130311</startdate><enddate>20130311</enddate><creator>Matsen, 4th, Frederick A</creator><creator>Evans, Steven N</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20130311</creationdate><title>Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison</title><author>Matsen, 4th, Frederick A ; Evans, Steven N</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-d619fbec682c2a66312032da12d7d36a657fbfc36a3ef5e4f7bc86339061fef13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Bioinformatics</topic><topic>Biology</topic><topic>Cluster Analysis</topic><topic>Clustering</topic><topic>Coloring</topic><topic>Computer Science</topic><topic>Environmental Microbiology</topic><topic>Female</topic><topic>Genomes</topic><topic>Humans</topic><topic>Leaves</topic><topic>Mathematics</topic><topic>Metagenome - genetics</topic><topic>Methods</topic><topic>Microbial activity</topic><topic>Microbiota</topic><topic>Microorganisms</topic><topic>Phylogenetics</topic><topic>Phylogeny</topic><topic>Principal Component Analysis</topic><topic>Principal components analysis</topic><topic>Taxa</topic><topic>Taxonomy</topic><topic>Thickening</topic><topic>Vagina - microbiology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Matsen, 4th, Frederick A</creatorcontrib><creatorcontrib>Evans, Steven N</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>ProQuest Nursing & Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>ProQuest Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>ProQuest Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Database (1962 - current)</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>https://resources.nclive.org/materials</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>Biological Sciences</collection><collection>Agriculture Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>ProQuest Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Matsen, 4th, Frederick A</au><au>Evans, Steven N</au><au>Moustafa, Ahmed</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2013-03-11</date><risdate>2013</risdate><volume>8</volume><issue>3</issue><spage>e56859</spage><epage>e56859</epage><pages>e56859-e56859</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate "average" of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>23505415</pmid><doi>10.1371/journal.pone.0056859</doi><tpages>e56859</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1932-6203 |
ispartof | PloS one, 2013-03, Vol.8 (3), p.e56859-e56859 |
issn | 1932-6203 1932-6203 |
language | eng |
recordid | cdi_plos_journals_1330895006 |
source | Public Library of Science (PLoS) Journals Open Access; MEDLINE; DOAJ Directory of Open Access Journals; Free E-Journal (出版社公開部分のみ); PubMed Central; Free Full-Text Journals in Chemistry |
subjects | Algorithms Bioinformatics Biology Cluster Analysis Clustering Coloring Computer Science Environmental Microbiology Female Genomes Humans Leaves Mathematics Metagenome - genetics Methods Microbial activity Microbiota Microorganisms Phylogenetics Phylogeny Principal Component Analysis Principal components analysis Taxa Taxonomy Thickening Vagina - microbiology |
title | Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T07%3A00%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Edge%20principal%20components%20and%20squash%20clustering:%20using%20the%20special%20structure%20of%20phylogenetic%20placement%20data%20for%20sample%20comparison&rft.jtitle=PloS%20one&rft.au=Matsen,%204th,%20Frederick%20A&rft.date=2013-03-11&rft.volume=8&rft.issue=3&rft.spage=e56859&rft.epage=e56859&rft.pages=e56859-e56859&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0056859&rft_dat=%3Cgale_plos_%3EA478325646%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1330895006&rft_id=info:pmid/23505415&rft_galeid=A478325646&rft_doaj_id=oai_doaj_org_article_457c6c9bdf204a8daf222f61ebab687a&rfr_iscdi=true |