CLUSTER STABILITY ESTIMATION BASED ON A MINIMAL SPANNING TREES APPROACH
Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determinin...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 305 |
---|---|
container_issue | |
container_start_page | 299 |
container_title | |
container_volume | 1159 |
creator | Volkovich, Zeev Vladimir Barzily, Zeev Weber, Gerhard-Wilhelm Toledano-Kitai, Dvora |
description | Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters. |
doi_str_mv | 10.1063/1.3223945 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_miscellaneous_34963430</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>34963430</sourcerecordid><originalsourceid>FETCH-LOGICAL-p186t-62a9994fd069a6608f7745b693331895cdf3385e355bd1c07528891901148f523</originalsourceid><addsrcrecordid>eNotjL1OwzAYAD2ARCkMvIEnthTbn-34G90QUktpEsWuBFOV5kcCFVpw-_5UgulONxwhD5wtONPwxBcgBKBUV2TGGMpESHi9IbcxfjAmME3NjBRZufEhb6kPdulKF95o7oNb2-Dqii6tz5_pRSxdu-pSS-obW1WuKmho89xT2zRtbbPVHbmeun0c7_85J5uXPGSrpKwLl9kyOXKjT4kWHSLKaWAaO62ZmdJUqp1GAOAGVT9MAEaNoNRu4D1LlTAGOTLOpZmUgDl5_Psefw7f5zGetp_vsR_3--5rPJzjFiRqkMDgF1czQz4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>34963430</pqid></control><display><type>conference_proceeding</type><title>CLUSTER STABILITY ESTIMATION BASED ON A MINIMAL SPANNING TREES APPROACH</title><source>AIP Journals Complete</source><creator>Volkovich, Zeev Vladimir ; Barzily, Zeev ; Weber, Gerhard-Wilhelm ; Toledano-Kitai, Dvora</creator><creatorcontrib>Volkovich, Zeev Vladimir ; Barzily, Zeev ; Weber, Gerhard-Wilhelm ; Toledano-Kitai, Dvora</creatorcontrib><description>Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.</description><identifier>ISSN: 0094-243X</identifier><identifier>DOI: 10.1063/1.3223945</identifier><language>eng</language><ispartof>AIP conference proceedings, 2009, Vol.1159, p.299-305</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids></links><search><creatorcontrib>Volkovich, Zeev Vladimir</creatorcontrib><creatorcontrib>Barzily, Zeev</creatorcontrib><creatorcontrib>Weber, Gerhard-Wilhelm</creatorcontrib><creatorcontrib>Toledano-Kitai, Dvora</creatorcontrib><title>CLUSTER STABILITY ESTIMATION BASED ON A MINIMAL SPANNING TREES APPROACH</title><title>AIP conference proceedings</title><description>Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.</description><issn>0094-243X</issn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNotjL1OwzAYAD2ARCkMvIEnthTbn-34G90QUktpEsWuBFOV5kcCFVpw-_5UgulONxwhD5wtONPwxBcgBKBUV2TGGMpESHi9IbcxfjAmME3NjBRZufEhb6kPdulKF95o7oNb2-Dqii6tz5_pRSxdu-pSS-obW1WuKmho89xT2zRtbbPVHbmeun0c7_85J5uXPGSrpKwLl9kyOXKjT4kWHSLKaWAaO62ZmdJUqp1GAOAGVT9MAEaNoNRu4D1LlTAGOTLOpZmUgDl5_Psefw7f5zGetp_vsR_3--5rPJzjFiRqkMDgF1czQz4</recordid><startdate>20090101</startdate><enddate>20090101</enddate><creator>Volkovich, Zeev Vladimir</creator><creator>Barzily, Zeev</creator><creator>Weber, Gerhard-Wilhelm</creator><creator>Toledano-Kitai, Dvora</creator><scope>7U5</scope><scope>8FD</scope><scope>L7M</scope></search><sort><creationdate>20090101</creationdate><title>CLUSTER STABILITY ESTIMATION BASED ON A MINIMAL SPANNING TREES APPROACH</title><author>Volkovich, Zeev Vladimir ; Barzily, Zeev ; Weber, Gerhard-Wilhelm ; Toledano-Kitai, Dvora</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p186t-62a9994fd069a6608f7745b693331895cdf3385e355bd1c07528891901148f523</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Volkovich, Zeev Vladimir</creatorcontrib><creatorcontrib>Barzily, Zeev</creatorcontrib><creatorcontrib>Weber, Gerhard-Wilhelm</creatorcontrib><creatorcontrib>Toledano-Kitai, Dvora</creatorcontrib><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Volkovich, Zeev Vladimir</au><au>Barzily, Zeev</au><au>Weber, Gerhard-Wilhelm</au><au>Toledano-Kitai, Dvora</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>CLUSTER STABILITY ESTIMATION BASED ON A MINIMAL SPANNING TREES APPROACH</atitle><btitle>AIP conference proceedings</btitle><date>2009-01-01</date><risdate>2009</risdate><volume>1159</volume><spage>299</spage><epage>305</epage><pages>299-305</pages><issn>0094-243X</issn><abstract>Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.</abstract><doi>10.1063/1.3223945</doi><tpages>7</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0094-243X |
ispartof | AIP conference proceedings, 2009, Vol.1159, p.299-305 |
issn | 0094-243X |
language | eng |
recordid | cdi_proquest_miscellaneous_34963430 |
source | AIP Journals Complete |
title | CLUSTER STABILITY ESTIMATION BASED ON A MINIMAL SPANNING TREES APPROACH |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T12%3A19%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=CLUSTER%20STABILITY%20ESTIMATION%20BASED%20ON%20A%20MINIMAL%20SPANNING%20TREES%20APPROACH&rft.btitle=AIP%20conference%20proceedings&rft.au=Volkovich,%20Zeev%20Vladimir&rft.date=2009-01-01&rft.volume=1159&rft.spage=299&rft.epage=305&rft.pages=299-305&rft.issn=0094-243X&rft_id=info:doi/10.1063/1.3223945&rft_dat=%3Cproquest%3E34963430%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=34963430&rft_id=info:pmid/&rfr_iscdi=true |