A Split-Merge Framework for Comparing Clusterings
Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a gene...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Xiang, Qiaoliang Mao, Qi Chai, Kian Ming Chieu, Hai Leong Tsang, Ivor Zhao, Zhendong |
description | Clustering evaluation measures are frequently used to evaluate the
performance of algorithms. However, most measures are not properly normalized
and ignore some information in the inherent structure of clusterings. We model
the relation between two clusterings as a bipartite graph and propose a general
component-based decomposition formula based on the components of the graph.
Most existing measures are examples of this formula. In order to satisfy
consistency in the component, we further propose a split-merge framework for
comparing clusterings of different data sets. Our framework gives measures that
are conditionally normalized, and it can make use of data point information,
such as feature vectors and pairwise distances. We use an entropy-based
instance of the framework and a coreference resolution data set to demonstrate
empirically the utility of our framework over other measures. |
doi_str_mv | 10.48550/arxiv.1206.6475 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1206_6475</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1206_6475</sourcerecordid><originalsourceid>FETCH-LOGICAL-a655-5894483f1fb9887710ea1ef41b1b9fc9266da979f2fee522649038cf5e49e1f73</originalsourceid><addsrcrecordid>eNotzrtug0AQheFtXES2-1TRvgBkZ9lraaHgRCJKEXq0kBkLGQxafMvbRySpzl8dfYw9gkiV01o8h3jvrilIYVKjrH5gsOOfU9-dk3eMB-RFDAPexnjkNEaej8MUYnc68Ly_zGdcct6wFYV-xu3_rllVvFT5a1J-7N_yXZkEo3WinVfKZQTUeOesBYEBkBQ00HhqvTTmK3jrSRKiltIoLzLXkkblEchma_b0d_tLrqfYDSF-1wu9XujZD75aPVg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Split-Merge Framework for Comparing Clusterings</title><source>arXiv.org</source><creator>Xiang, Qiaoliang ; Mao, Qi ; Chai, Kian Ming ; Chieu, Hai Leong ; Tsang, Ivor ; Zhao, Zhendong</creator><creatorcontrib>Xiang, Qiaoliang ; Mao, Qi ; Chai, Kian Ming ; Chieu, Hai Leong ; Tsang, Ivor ; Zhao, Zhendong</creatorcontrib><description>Clustering evaluation measures are frequently used to evaluate the
performance of algorithms. However, most measures are not properly normalized
and ignore some information in the inherent structure of clusterings. We model
the relation between two clusterings as a bipartite graph and propose a general
component-based decomposition formula based on the components of the graph.
Most existing measures are examples of this formula. In order to satisfy
consistency in the component, we further propose a split-merge framework for
comparing clusterings of different data sets. Our framework gives measures that
are conditionally normalized, and it can make use of data point information,
such as feature vectors and pairwise distances. We use an entropy-based
instance of the framework and a coreference resolution data set to demonstrate
empirically the utility of our framework over other measures.</description><identifier>DOI: 10.48550/arxiv.1206.6475</identifier><language>eng</language><subject>Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2012-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1206.6475$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1206.6475$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xiang, Qiaoliang</creatorcontrib><creatorcontrib>Mao, Qi</creatorcontrib><creatorcontrib>Chai, Kian Ming</creatorcontrib><creatorcontrib>Chieu, Hai Leong</creatorcontrib><creatorcontrib>Tsang, Ivor</creatorcontrib><creatorcontrib>Zhao, Zhendong</creatorcontrib><title>A Split-Merge Framework for Comparing Clusterings</title><description>Clustering evaluation measures are frequently used to evaluate the
performance of algorithms. However, most measures are not properly normalized
and ignore some information in the inherent structure of clusterings. We model
the relation between two clusterings as a bipartite graph and propose a general
component-based decomposition formula based on the components of the graph.
Most existing measures are examples of this formula. In order to satisfy
consistency in the component, we further propose a split-merge framework for
comparing clusterings of different data sets. Our framework gives measures that
are conditionally normalized, and it can make use of data point information,
such as feature vectors and pairwise distances. We use an entropy-based
instance of the framework and a coreference resolution data set to demonstrate
empirically the utility of our framework over other measures.</description><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrtug0AQheFtXES2-1TRvgBkZ9lraaHgRCJKEXq0kBkLGQxafMvbRySpzl8dfYw9gkiV01o8h3jvrilIYVKjrH5gsOOfU9-dk3eMB-RFDAPexnjkNEaej8MUYnc68Ly_zGdcct6wFYV-xu3_rllVvFT5a1J-7N_yXZkEo3WinVfKZQTUeOesBYEBkBQ00HhqvTTmK3jrSRKiltIoLzLXkkblEchma_b0d_tLrqfYDSF-1wu9XujZD75aPVg</recordid><startdate>20120627</startdate><enddate>20120627</enddate><creator>Xiang, Qiaoliang</creator><creator>Mao, Qi</creator><creator>Chai, Kian Ming</creator><creator>Chieu, Hai Leong</creator><creator>Tsang, Ivor</creator><creator>Zhao, Zhendong</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20120627</creationdate><title>A Split-Merge Framework for Comparing Clusterings</title><author>Xiang, Qiaoliang ; Mao, Qi ; Chai, Kian Ming ; Chieu, Hai Leong ; Tsang, Ivor ; Zhao, Zhendong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a655-5894483f1fb9887710ea1ef41b1b9fc9266da979f2fee522649038cf5e49e1f73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiang, Qiaoliang</creatorcontrib><creatorcontrib>Mao, Qi</creatorcontrib><creatorcontrib>Chai, Kian Ming</creatorcontrib><creatorcontrib>Chieu, Hai Leong</creatorcontrib><creatorcontrib>Tsang, Ivor</creatorcontrib><creatorcontrib>Zhao, Zhendong</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiang, Qiaoliang</au><au>Mao, Qi</au><au>Chai, Kian Ming</au><au>Chieu, Hai Leong</au><au>Tsang, Ivor</au><au>Zhao, Zhendong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Split-Merge Framework for Comparing Clusterings</atitle><date>2012-06-27</date><risdate>2012</risdate><abstract>Clustering evaluation measures are frequently used to evaluate the
performance of algorithms. However, most measures are not properly normalized
and ignore some information in the inherent structure of clusterings. We model
the relation between two clusterings as a bipartite graph and propose a general
component-based decomposition formula based on the components of the graph.
Most existing measures are examples of this formula. In order to satisfy
consistency in the component, we further propose a split-merge framework for
comparing clusterings of different data sets. Our framework gives measures that
are conditionally normalized, and it can make use of data point information,
such as feature vectors and pairwise distances. We use an entropy-based
instance of the framework and a coreference resolution data set to demonstrate
empirically the utility of our framework over other measures.</abstract><doi>10.48550/arxiv.1206.6475</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1206.6475 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1206_6475 |
source | arXiv.org |
subjects | Computer Science - Learning Statistics - Machine Learning |
title | A Split-Merge Framework for Comparing Clusterings |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T04%3A07%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Split-Merge%20Framework%20for%20Comparing%20Clusterings&rft.au=Xiang,%20Qiaoliang&rft.date=2012-06-27&rft_id=info:doi/10.48550/arxiv.1206.6475&rft_dat=%3Carxiv_GOX%3E1206_6475%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |