A Split-Merge Framework for Comparing Clusterings

Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a gene...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Xiang, Qiaoliang, Mao, Qi, Chai, Kian Ming, Chieu, Hai Leong, Tsang, Ivor, Zhao, Zhendong
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Xiang, Qiaoliang Mao, Qi Chai, Kian Ming Chieu, Hai Leong Tsang, Ivor Zhao, Zhendong
description	Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a general component-based decomposition formula based on the components of the graph. Most existing measures are examples of this formula. In order to satisfy consistency in the component, we further propose a split-merge framework for comparing clusterings of different data sets. Our framework gives measures that are conditionally normalized, and it can make use of data point information, such as feature vectors and pairwise distances. We use an entropy-based instance of the framework and a coreference resolution data set to demonstrate empirically the utility of our framework over other measures.
doi_str_mv	10.48550/arxiv.1206.6475
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1206_6475</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1206_6475</sourcerecordid><originalsourceid>FETCH-LOGICAL-a655-5894483f1fb9887710ea1ef41b1b9fc9266da979f2fee522649038cf5e49e1f73</originalsourceid><addsrcrecordid>eNotzrtug0AQheFtXES2-1TRvgBkZ9lraaHgRCJKEXq0kBkLGQxafMvbRySpzl8dfYw9gkiV01o8h3jvrilIYVKjrH5gsOOfU9-dk3eMB-RFDAPexnjkNEaej8MUYnc68Ly_zGdcct6wFYV-xu3_rllVvFT5a1J-7N_yXZkEo3WinVfKZQTUeOesBYEBkBQ00HhqvTTmK3jrSRKiltIoLzLXkkblEchma_b0d_tLrqfYDSF-1wu9XujZD75aPVg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Split-Merge Framework for Comparing Clusterings</title><source>arXiv.org</source><creator>Xiang, Qiaoliang ; Mao, Qi ; Chai, Kian Ming ; Chieu, Hai Leong ; Tsang, Ivor ; Zhao, Zhendong</creator><creatorcontrib>Xiang, Qiaoliang ; Mao, Qi ; Chai, Kian Ming ; Chieu, Hai Leong ; Tsang, Ivor ; Zhao, Zhendong</creatorcontrib><description>Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a general component-based decomposition formula based on the components of the graph. Most existing measures are examples of this formula. In order to satisfy consistency in the component, we further propose a split-merge framework for comparing clusterings of different data sets. Our framework gives measures that are conditionally normalized, and it can make use of data point information, such as feature vectors and pairwise distances. We use an entropy-based instance of the framework and a coreference resolution data set to demonstrate empirically the utility of our framework over other measures.</description><identifier>DOI: 10.48550/arxiv.1206.6475</identifier><language>eng</language><subject>Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2012-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1206.6475$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1206.6475$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xiang, Qiaoliang</creatorcontrib><creatorcontrib>Mao, Qi</creatorcontrib><creatorcontrib>Chai, Kian Ming</creatorcontrib><creatorcontrib>Chieu, Hai Leong</creatorcontrib><creatorcontrib>Tsang, Ivor</creatorcontrib><creatorcontrib>Zhao, Zhendong</creatorcontrib><title>A Split-Merge Framework for Comparing Clusterings</title><description>Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a general component-based decomposition formula based on the components of the graph. Most existing measures are examples of this formula. In order to satisfy consistency in the component, we further propose a split-merge framework for comparing clusterings of different data sets. Our framework gives measures that are conditionally normalized, and it can make use of data point information, such as feature vectors and pairwise distances. We use an entropy-based instance of the framework and a coreference resolution data set to demonstrate empirically the utility of our framework over other measures.</description><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrtug0AQheFtXES2-1TRvgBkZ9lraaHgRCJKEXq0kBkLGQxafMvbRySpzl8dfYw9gkiV01o8h3jvrilIYVKjrH5gsOOfU9-dk3eMB-RFDAPexnjkNEaej8MUYnc68Ly_zGdcct6wFYV-xu3_rllVvFT5a1J-7N_yXZkEo3WinVfKZQTUeOesBYEBkBQ00HhqvTTmK3jrSRKiltIoLzLXkkblEchma_b0d_tLrqfYDSF-1wu9XujZD75aPVg</recordid><startdate>20120627</startdate><enddate>20120627</enddate><creator>Xiang, Qiaoliang</creator><creator>Mao, Qi</creator><creator>Chai, Kian Ming</creator><creator>Chieu, Hai Leong</creator><creator>Tsang, Ivor</creator><creator>Zhao, Zhendong</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20120627</creationdate><title>A Split-Merge Framework for Comparing Clusterings</title><author>Xiang, Qiaoliang ; Mao, Qi ; Chai, Kian Ming ; Chieu, Hai Leong ; Tsang, Ivor ; Zhao, Zhendong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a655-5894483f1fb9887710ea1ef41b1b9fc9266da979f2fee522649038cf5e49e1f73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiang, Qiaoliang</creatorcontrib><creatorcontrib>Mao, Qi</creatorcontrib><creatorcontrib>Chai, Kian Ming</creatorcontrib><creatorcontrib>Chieu, Hai Leong</creatorcontrib><creatorcontrib>Tsang, Ivor</creatorcontrib><creatorcontrib>Zhao, Zhendong</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiang, Qiaoliang</au><au>Mao, Qi</au><au>Chai, Kian Ming</au><au>Chieu, Hai Leong</au><au>Tsang, Ivor</au><au>Zhao, Zhendong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Split-Merge Framework for Comparing Clusterings</atitle><date>2012-06-27</date><risdate>2012</risdate><abstract>Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a general component-based decomposition formula based on the components of the graph. Most existing measures are examples of this formula. In order to satisfy consistency in the component, we further propose a split-merge framework for comparing clusterings of different data sets. Our framework gives measures that are conditionally normalized, and it can make use of data point information, such as feature vectors and pairwise distances. We use an entropy-based instance of the framework and a coreference resolution data set to demonstrate empirically the utility of our framework over other measures.</abstract><doi>10.48550/arxiv.1206.6475</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1206.6475
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1206_6475
source	arXiv.org
subjects	Computer Science - Learning Statistics - Machine Learning
title	A Split-Merge Framework for Comparing Clusterings
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T04%3A07%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Split-Merge%20Framework%20for%20Comparing%20Clusterings&rft.au=Xiang,%20Qiaoliang&rft.date=2012-06-27&rft_id=info:doi/10.48550/arxiv.1206.6475&rft_dat=%3Carxiv_GOX%3E1206_6475%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true