Testing the rogue taxa hypothesis for clustering instability
Higlights•Instability in hierarchical trees measured using a novel tree distance.•Low tree consensus due to flaws in tree building algorithm and not rogue taxa.•Standard neighbor joining algorithm stability depends on the sample subset used.•Our novel bubble clustering method creates more stable hie...
Gespeichert in:
Veröffentlicht in: | Journal of theoretical biology 2019-07, Vol.472, p.36-45 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 45 |
---|---|
container_issue | |
container_start_page | 36 |
container_title | Journal of theoretical biology |
container_volume | 472 |
creator | Saunders, Amanda M. Ashlock, Daniel Graether, Steffen P. |
description | Higlights•Instability in hierarchical trees measured using a novel tree distance.•Low tree consensus due to flaws in tree building algorithm and not rogue taxa.•Standard neighbor joining algorithm stability depends on the sample subset used.•Our novel bubble clustering method creates more stable hierarchical trees.
There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of “rogue taxa”, i.e. taxa whose removal from a data set can apparently restore stability. In this study, the rogue taxa hypothesis is tested by partitioning a large data set into many smaller ones and checking for rogue behavior. The checking was performed with a standard hierarchical clustering algorithm and with a novel algorithm designed to have greater stability. It was found that rogue taxa cannot reasonably be said to exist because the status of being a rogue taxon depends on the data partition in which the taxon is embedded. In addition to the choice of data used, the choice of algorithm and algorithm parameters can have a large effect on the degree to which a taxon appears rogue. Instability in hierarchical clustering can be increased by problematic data points, but the status of data points being problematic depends not on their biological antecedents, but on their position in the local geometry of the data. The results of this study strongly suggest that instability in traditional hierarchical clustering routines is primarily a problem with the algorithm design. |
doi_str_mv | 10.1016/j.jtbi.2019.04.002 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2205408950</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0022519319301432</els_id><sourcerecordid>2205408950</sourcerecordid><originalsourceid>FETCH-LOGICAL-c356t-cfc53a9aeb364b08e299e53b91e061249995d11211ca5aa75860ff8f711470373</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMotlb_gAfZo5ddJ1_bDfQixS8oeKnnkE1n25RttyZZsf_elFaPngaG532ZeQi5pVBQoOXDuljH2hUMqCpAFADsjAwpKJlXUtBzMkwblkuq-IBchbAGACV4eUkGPEFCQjkkkzmG6LbLLK4w892yxyyab5Ot9rsurYILWdP5zLZ9iOgPoNuGaGrXuri_JheNaQPenOaIfDw_zaev-ez95W36OMstl2XMbWMlN8pgzUtRQ4VMKZS8VhShpEwopeSCUkapNdKYsaxKaJqqGVMqxsDHfETuj70733326WC9ccFi25otdn3QjIEUUCkJCWVH1PouBI-N3nm3MX6vKeiDNb3WB2v6YE2D0MlRCt2d-vt6g4u_yK-mBEyOAKYvvxx6HazDrcWF82ijXnTuv_4f71B9Hw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2205408950</pqid></control><display><type>article</type><title>Testing the rogue taxa hypothesis for clustering instability</title><source>Access via ScienceDirect (Elsevier)</source><creator>Saunders, Amanda M. ; Ashlock, Daniel ; Graether, Steffen P.</creator><creatorcontrib>Saunders, Amanda M. ; Ashlock, Daniel ; Graether, Steffen P.</creatorcontrib><description>Higlights•Instability in hierarchical trees measured using a novel tree distance.•Low tree consensus due to flaws in tree building algorithm and not rogue taxa.•Standard neighbor joining algorithm stability depends on the sample subset used.•Our novel bubble clustering method creates more stable hierarchical trees.
There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of “rogue taxa”, i.e. taxa whose removal from a data set can apparently restore stability. In this study, the rogue taxa hypothesis is tested by partitioning a large data set into many smaller ones and checking for rogue behavior. The checking was performed with a standard hierarchical clustering algorithm and with a novel algorithm designed to have greater stability. It was found that rogue taxa cannot reasonably be said to exist because the status of being a rogue taxon depends on the data partition in which the taxon is embedded. In addition to the choice of data used, the choice of algorithm and algorithm parameters can have a large effect on the degree to which a taxon appears rogue. Instability in hierarchical clustering can be increased by problematic data points, but the status of data points being problematic depends not on their biological antecedents, but on their position in the local geometry of the data. The results of this study strongly suggest that instability in traditional hierarchical clustering routines is primarily a problem with the algorithm design.</description><identifier>ISSN: 0022-5193</identifier><identifier>EISSN: 1095-8541</identifier><identifier>DOI: 10.1016/j.jtbi.2019.04.002</identifier><identifier>PMID: 30954506</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Bioinformatics ; Bootstraping ; Clustering stability ; Hierarchical clustering ; Phylogenetics</subject><ispartof>Journal of theoretical biology, 2019-07, Vol.472, p.36-45</ispartof><rights>2019 Elsevier Ltd</rights><rights>Copyright © 2019 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c356t-cfc53a9aeb364b08e299e53b91e061249995d11211ca5aa75860ff8f711470373</citedby><cites>FETCH-LOGICAL-c356t-cfc53a9aeb364b08e299e53b91e061249995d11211ca5aa75860ff8f711470373</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jtbi.2019.04.002$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>315,781,785,3551,27926,27927,45997</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30954506$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Saunders, Amanda M.</creatorcontrib><creatorcontrib>Ashlock, Daniel</creatorcontrib><creatorcontrib>Graether, Steffen P.</creatorcontrib><title>Testing the rogue taxa hypothesis for clustering instability</title><title>Journal of theoretical biology</title><addtitle>J Theor Biol</addtitle><description>Higlights•Instability in hierarchical trees measured using a novel tree distance.•Low tree consensus due to flaws in tree building algorithm and not rogue taxa.•Standard neighbor joining algorithm stability depends on the sample subset used.•Our novel bubble clustering method creates more stable hierarchical trees.
There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of “rogue taxa”, i.e. taxa whose removal from a data set can apparently restore stability. In this study, the rogue taxa hypothesis is tested by partitioning a large data set into many smaller ones and checking for rogue behavior. The checking was performed with a standard hierarchical clustering algorithm and with a novel algorithm designed to have greater stability. It was found that rogue taxa cannot reasonably be said to exist because the status of being a rogue taxon depends on the data partition in which the taxon is embedded. In addition to the choice of data used, the choice of algorithm and algorithm parameters can have a large effect on the degree to which a taxon appears rogue. Instability in hierarchical clustering can be increased by problematic data points, but the status of data points being problematic depends not on their biological antecedents, but on their position in the local geometry of the data. The results of this study strongly suggest that instability in traditional hierarchical clustering routines is primarily a problem with the algorithm design.</description><subject>Bioinformatics</subject><subject>Bootstraping</subject><subject>Clustering stability</subject><subject>Hierarchical clustering</subject><subject>Phylogenetics</subject><issn>0022-5193</issn><issn>1095-8541</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMotlb_gAfZo5ddJ1_bDfQixS8oeKnnkE1n25RttyZZsf_elFaPngaG532ZeQi5pVBQoOXDuljH2hUMqCpAFADsjAwpKJlXUtBzMkwblkuq-IBchbAGACV4eUkGPEFCQjkkkzmG6LbLLK4w892yxyyab5Ot9rsurYILWdP5zLZ9iOgPoNuGaGrXuri_JheNaQPenOaIfDw_zaev-ez95W36OMstl2XMbWMlN8pgzUtRQ4VMKZS8VhShpEwopeSCUkapNdKYsaxKaJqqGVMqxsDHfETuj70733326WC9ccFi25otdn3QjIEUUCkJCWVH1PouBI-N3nm3MX6vKeiDNb3WB2v6YE2D0MlRCt2d-vt6g4u_yK-mBEyOAKYvvxx6HazDrcWF82ijXnTuv_4f71B9Hw</recordid><startdate>20190707</startdate><enddate>20190707</enddate><creator>Saunders, Amanda M.</creator><creator>Ashlock, Daniel</creator><creator>Graether, Steffen P.</creator><general>Elsevier Ltd</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20190707</creationdate><title>Testing the rogue taxa hypothesis for clustering instability</title><author>Saunders, Amanda M. ; Ashlock, Daniel ; Graether, Steffen P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c356t-cfc53a9aeb364b08e299e53b91e061249995d11211ca5aa75860ff8f711470373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Bioinformatics</topic><topic>Bootstraping</topic><topic>Clustering stability</topic><topic>Hierarchical clustering</topic><topic>Phylogenetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Saunders, Amanda M.</creatorcontrib><creatorcontrib>Ashlock, Daniel</creatorcontrib><creatorcontrib>Graether, Steffen P.</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of theoretical biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Saunders, Amanda M.</au><au>Ashlock, Daniel</au><au>Graether, Steffen P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Testing the rogue taxa hypothesis for clustering instability</atitle><jtitle>Journal of theoretical biology</jtitle><addtitle>J Theor Biol</addtitle><date>2019-07-07</date><risdate>2019</risdate><volume>472</volume><spage>36</spage><epage>45</epage><pages>36-45</pages><issn>0022-5193</issn><eissn>1095-8541</eissn><abstract>Higlights•Instability in hierarchical trees measured using a novel tree distance.•Low tree consensus due to flaws in tree building algorithm and not rogue taxa.•Standard neighbor joining algorithm stability depends on the sample subset used.•Our novel bubble clustering method creates more stable hierarchical trees.
There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of “rogue taxa”, i.e. taxa whose removal from a data set can apparently restore stability. In this study, the rogue taxa hypothesis is tested by partitioning a large data set into many smaller ones and checking for rogue behavior. The checking was performed with a standard hierarchical clustering algorithm and with a novel algorithm designed to have greater stability. It was found that rogue taxa cannot reasonably be said to exist because the status of being a rogue taxon depends on the data partition in which the taxon is embedded. In addition to the choice of data used, the choice of algorithm and algorithm parameters can have a large effect on the degree to which a taxon appears rogue. Instability in hierarchical clustering can be increased by problematic data points, but the status of data points being problematic depends not on their biological antecedents, but on their position in the local geometry of the data. The results of this study strongly suggest that instability in traditional hierarchical clustering routines is primarily a problem with the algorithm design.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>30954506</pmid><doi>10.1016/j.jtbi.2019.04.002</doi><tpages>10</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0022-5193 |
ispartof | Journal of theoretical biology, 2019-07, Vol.472, p.36-45 |
issn | 0022-5193 1095-8541 |
language | eng |
recordid | cdi_proquest_miscellaneous_2205408950 |
source | Access via ScienceDirect (Elsevier) |
subjects | Bioinformatics Bootstraping Clustering stability Hierarchical clustering Phylogenetics |
title | Testing the rogue taxa hypothesis for clustering instability |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T20%3A49%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Testing%20the%20rogue%20taxa%20hypothesis%20for%20clustering%20instability&rft.jtitle=Journal%20of%20theoretical%20biology&rft.au=Saunders,%20Amanda%20M.&rft.date=2019-07-07&rft.volume=472&rft.spage=36&rft.epage=45&rft.pages=36-45&rft.issn=0022-5193&rft.eissn=1095-8541&rft_id=info:doi/10.1016/j.jtbi.2019.04.002&rft_dat=%3Cproquest_cross%3E2205408950%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2205408950&rft_id=info:pmid/30954506&rft_els_id=S0022519319301432&rfr_iscdi=true |