Testing the rogue taxa hypothesis for clustering instability

Higlights•Instability in hierarchical trees measured using a novel tree distance.•Low tree consensus due to flaws in tree building algorithm and not rogue taxa.•Standard neighbor joining algorithm stability depends on the sample subset used.•Our novel bubble clustering method creates more stable hie...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of theoretical biology 2019-07, Vol.472, p.36-45
Hauptverfasser: Saunders, Amanda M., Ashlock, Daniel, Graether, Steffen P.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 45
container_issue
container_start_page 36
container_title Journal of theoretical biology
container_volume 472
creator Saunders, Amanda M.
Ashlock, Daniel
Graether, Steffen P.
description Higlights•Instability in hierarchical trees measured using a novel tree distance.•Low tree consensus due to flaws in tree building algorithm and not rogue taxa.•Standard neighbor joining algorithm stability depends on the sample subset used.•Our novel bubble clustering method creates more stable hierarchical trees. There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of “rogue taxa”, i.e. taxa whose removal from a data set can apparently restore stability. In this study, the rogue taxa hypothesis is tested by partitioning a large data set into many smaller ones and checking for rogue behavior. The checking was performed with a standard hierarchical clustering algorithm and with a novel algorithm designed to have greater stability. It was found that rogue taxa cannot reasonably be said to exist because the status of being a rogue taxon depends on the data partition in which the taxon is embedded. In addition to the choice of data used, the choice of algorithm and algorithm parameters can have a large effect on the degree to which a taxon appears rogue. Instability in hierarchical clustering can be increased by problematic data points, but the status of data points being problematic depends not on their biological antecedents, but on their position in the local geometry of the data. The results of this study strongly suggest that instability in traditional hierarchical clustering routines is primarily a problem with the algorithm design.
doi_str_mv 10.1016/j.jtbi.2019.04.002
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2205408950</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0022519319301432</els_id><sourcerecordid>2205408950</sourcerecordid><originalsourceid>FETCH-LOGICAL-c356t-cfc53a9aeb364b08e299e53b91e061249995d11211ca5aa75860ff8f711470373</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMotlb_gAfZo5ddJ1_bDfQixS8oeKnnkE1n25RttyZZsf_elFaPngaG532ZeQi5pVBQoOXDuljH2hUMqCpAFADsjAwpKJlXUtBzMkwblkuq-IBchbAGACV4eUkGPEFCQjkkkzmG6LbLLK4w892yxyyab5Ot9rsurYILWdP5zLZ9iOgPoNuGaGrXuri_JheNaQPenOaIfDw_zaev-ez95W36OMstl2XMbWMlN8pgzUtRQ4VMKZS8VhShpEwopeSCUkapNdKYsaxKaJqqGVMqxsDHfETuj70733326WC9ccFi25otdn3QjIEUUCkJCWVH1PouBI-N3nm3MX6vKeiDNb3WB2v6YE2D0MlRCt2d-vt6g4u_yK-mBEyOAKYvvxx6HazDrcWF82ijXnTuv_4f71B9Hw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2205408950</pqid></control><display><type>article</type><title>Testing the rogue taxa hypothesis for clustering instability</title><source>Access via ScienceDirect (Elsevier)</source><creator>Saunders, Amanda M. ; Ashlock, Daniel ; Graether, Steffen P.</creator><creatorcontrib>Saunders, Amanda M. ; Ashlock, Daniel ; Graether, Steffen P.</creatorcontrib><description>Higlights•Instability in hierarchical trees measured using a novel tree distance.•Low tree consensus due to flaws in tree building algorithm and not rogue taxa.•Standard neighbor joining algorithm stability depends on the sample subset used.•Our novel bubble clustering method creates more stable hierarchical trees. There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of “rogue taxa”, i.e. taxa whose removal from a data set can apparently restore stability. In this study, the rogue taxa hypothesis is tested by partitioning a large data set into many smaller ones and checking for rogue behavior. The checking was performed with a standard hierarchical clustering algorithm and with a novel algorithm designed to have greater stability. It was found that rogue taxa cannot reasonably be said to exist because the status of being a rogue taxon depends on the data partition in which the taxon is embedded. In addition to the choice of data used, the choice of algorithm and algorithm parameters can have a large effect on the degree to which a taxon appears rogue. Instability in hierarchical clustering can be increased by problematic data points, but the status of data points being problematic depends not on their biological antecedents, but on their position in the local geometry of the data. The results of this study strongly suggest that instability in traditional hierarchical clustering routines is primarily a problem with the algorithm design.</description><identifier>ISSN: 0022-5193</identifier><identifier>EISSN: 1095-8541</identifier><identifier>DOI: 10.1016/j.jtbi.2019.04.002</identifier><identifier>PMID: 30954506</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Bioinformatics ; Bootstraping ; Clustering stability ; Hierarchical clustering ; Phylogenetics</subject><ispartof>Journal of theoretical biology, 2019-07, Vol.472, p.36-45</ispartof><rights>2019 Elsevier Ltd</rights><rights>Copyright © 2019 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c356t-cfc53a9aeb364b08e299e53b91e061249995d11211ca5aa75860ff8f711470373</citedby><cites>FETCH-LOGICAL-c356t-cfc53a9aeb364b08e299e53b91e061249995d11211ca5aa75860ff8f711470373</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jtbi.2019.04.002$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>315,781,785,3551,27926,27927,45997</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30954506$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Saunders, Amanda M.</creatorcontrib><creatorcontrib>Ashlock, Daniel</creatorcontrib><creatorcontrib>Graether, Steffen P.</creatorcontrib><title>Testing the rogue taxa hypothesis for clustering instability</title><title>Journal of theoretical biology</title><addtitle>J Theor Biol</addtitle><description>Higlights•Instability in hierarchical trees measured using a novel tree distance.•Low tree consensus due to flaws in tree building algorithm and not rogue taxa.•Standard neighbor joining algorithm stability depends on the sample subset used.•Our novel bubble clustering method creates more stable hierarchical trees. There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of “rogue taxa”, i.e. taxa whose removal from a data set can apparently restore stability. In this study, the rogue taxa hypothesis is tested by partitioning a large data set into many smaller ones and checking for rogue behavior. The checking was performed with a standard hierarchical clustering algorithm and with a novel algorithm designed to have greater stability. It was found that rogue taxa cannot reasonably be said to exist because the status of being a rogue taxon depends on the data partition in which the taxon is embedded. In addition to the choice of data used, the choice of algorithm and algorithm parameters can have a large effect on the degree to which a taxon appears rogue. Instability in hierarchical clustering can be increased by problematic data points, but the status of data points being problematic depends not on their biological antecedents, but on their position in the local geometry of the data. The results of this study strongly suggest that instability in traditional hierarchical clustering routines is primarily a problem with the algorithm design.</description><subject>Bioinformatics</subject><subject>Bootstraping</subject><subject>Clustering stability</subject><subject>Hierarchical clustering</subject><subject>Phylogenetics</subject><issn>0022-5193</issn><issn>1095-8541</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMotlb_gAfZo5ddJ1_bDfQixS8oeKnnkE1n25RttyZZsf_elFaPngaG532ZeQi5pVBQoOXDuljH2hUMqCpAFADsjAwpKJlXUtBzMkwblkuq-IBchbAGACV4eUkGPEFCQjkkkzmG6LbLLK4w892yxyyab5Ot9rsurYILWdP5zLZ9iOgPoNuGaGrXuri_JheNaQPenOaIfDw_zaev-ez95W36OMstl2XMbWMlN8pgzUtRQ4VMKZS8VhShpEwopeSCUkapNdKYsaxKaJqqGVMqxsDHfETuj70733326WC9ccFi25otdn3QjIEUUCkJCWVH1PouBI-N3nm3MX6vKeiDNb3WB2v6YE2D0MlRCt2d-vt6g4u_yK-mBEyOAKYvvxx6HazDrcWF82ijXnTuv_4f71B9Hw</recordid><startdate>20190707</startdate><enddate>20190707</enddate><creator>Saunders, Amanda M.</creator><creator>Ashlock, Daniel</creator><creator>Graether, Steffen P.</creator><general>Elsevier Ltd</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20190707</creationdate><title>Testing the rogue taxa hypothesis for clustering instability</title><author>Saunders, Amanda M. ; Ashlock, Daniel ; Graether, Steffen P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c356t-cfc53a9aeb364b08e299e53b91e061249995d11211ca5aa75860ff8f711470373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Bioinformatics</topic><topic>Bootstraping</topic><topic>Clustering stability</topic><topic>Hierarchical clustering</topic><topic>Phylogenetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Saunders, Amanda M.</creatorcontrib><creatorcontrib>Ashlock, Daniel</creatorcontrib><creatorcontrib>Graether, Steffen P.</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of theoretical biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Saunders, Amanda M.</au><au>Ashlock, Daniel</au><au>Graether, Steffen P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Testing the rogue taxa hypothesis for clustering instability</atitle><jtitle>Journal of theoretical biology</jtitle><addtitle>J Theor Biol</addtitle><date>2019-07-07</date><risdate>2019</risdate><volume>472</volume><spage>36</spage><epage>45</epage><pages>36-45</pages><issn>0022-5193</issn><eissn>1095-8541</eissn><abstract>Higlights•Instability in hierarchical trees measured using a novel tree distance.•Low tree consensus due to flaws in tree building algorithm and not rogue taxa.•Standard neighbor joining algorithm stability depends on the sample subset used.•Our novel bubble clustering method creates more stable hierarchical trees. There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of “rogue taxa”, i.e. taxa whose removal from a data set can apparently restore stability. In this study, the rogue taxa hypothesis is tested by partitioning a large data set into many smaller ones and checking for rogue behavior. The checking was performed with a standard hierarchical clustering algorithm and with a novel algorithm designed to have greater stability. It was found that rogue taxa cannot reasonably be said to exist because the status of being a rogue taxon depends on the data partition in which the taxon is embedded. In addition to the choice of data used, the choice of algorithm and algorithm parameters can have a large effect on the degree to which a taxon appears rogue. Instability in hierarchical clustering can be increased by problematic data points, but the status of data points being problematic depends not on their biological antecedents, but on their position in the local geometry of the data. The results of this study strongly suggest that instability in traditional hierarchical clustering routines is primarily a problem with the algorithm design.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>30954506</pmid><doi>10.1016/j.jtbi.2019.04.002</doi><tpages>10</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0022-5193
ispartof Journal of theoretical biology, 2019-07, Vol.472, p.36-45
issn 0022-5193
1095-8541
language eng
recordid cdi_proquest_miscellaneous_2205408950
source Access via ScienceDirect (Elsevier)
subjects Bioinformatics
Bootstraping
Clustering stability
Hierarchical clustering
Phylogenetics
title Testing the rogue taxa hypothesis for clustering instability
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T20%3A49%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Testing%20the%20rogue%20taxa%20hypothesis%20for%20clustering%20instability&rft.jtitle=Journal%20of%20theoretical%20biology&rft.au=Saunders,%20Amanda%20M.&rft.date=2019-07-07&rft.volume=472&rft.spage=36&rft.epage=45&rft.pages=36-45&rft.issn=0022-5193&rft.eissn=1095-8541&rft_id=info:doi/10.1016/j.jtbi.2019.04.002&rft_dat=%3Cproquest_cross%3E2205408950%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2205408950&rft_id=info:pmid/30954506&rft_els_id=S0022519319301432&rfr_iscdi=true