A correlation-based fuzzy cluster validity index with secondary options detector

The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. This aspect has bee...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wiroonsri, Nathakhun, Preedasawakul, Onthada
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wiroonsri, Nathakhun Preedasawakul, Onthada
description	The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. This aspect has been overlooked by most of the existing works in this area. In this study, we introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri-Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. We evaluate and compare the performance of our index with several existing indexes, including Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2. We conduct this evaluation on four types of datasets: artificial datasets, real-world datasets, simulated datasets with ranks, and image datasets, using the fuzzy c-means algorithm. Overall, the WP index outperforms most, if not all, of these indexes in terms of accurately detecting the optimal number of clusters and providing accurate secondary options. Moreover, our index remains effective even when the fuzziness parameter $m$ is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI.
doi_str_mv	10.48550/arxiv.2308.14785
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2308_14785</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2308_14785</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-4795c248d2f6eb3d8fb5cae1b8390f410ffe43609954fbf3c07bcab90c1049a73</originalsourceid><addsrcrecordid>eNotz7tuwyAYBWCWDlXSB-hUXsAuGDAwRlFvUqRmyG5x-VGRHBMBSeM8fZu005nO0fkQeqSk5UoI8mzyOZ7ajhHVUi6VuEfbFXYpZxhNjWlqrCngcTheLjN247FUyPhkxuhjnXGcPJzxd6xfuIBLkzd5xulwLRbsoYKrKS_RXTBjgYf_XKDd68tu_d5sPt8-1qtNY3opGi61cB1Xvgs9WOZVsMIZoFYxTQKnJATgrCdaCx5sYI5I64zVxFHCtZFsgZ7-Zm-k4ZDj_vfNcKUNNxr7ASguSys</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A correlation-based fuzzy cluster validity index with secondary options detector</title><source>arXiv.org</source><creator>Wiroonsri, Nathakhun ; Preedasawakul, Onthada</creator><creatorcontrib>Wiroonsri, Nathakhun ; Preedasawakul, Onthada</creatorcontrib><description>The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. This aspect has been overlooked by most of the existing works in this area. In this study, we introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri-Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. We evaluate and compare the performance of our index with several existing indexes, including Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2. We conduct this evaluation on four types of datasets: artificial datasets, real-world datasets, simulated datasets with ranks, and image datasets, using the fuzzy c-means algorithm. Overall, the WP index outperforms most, if not all, of these indexes in terms of accurately detecting the optimal number of clusters and providing accurate secondary options. Moreover, our index remains effective even when the fuzziness parameter $m$ is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI.</description><identifier>DOI: 10.48550/arxiv.2308.14785</identifier><language>eng</language><subject>Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2023-08</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2308.14785$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2308.14785$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wiroonsri, Nathakhun</creatorcontrib><creatorcontrib>Preedasawakul, Onthada</creatorcontrib><title>A correlation-based fuzzy cluster validity index with secondary options detector</title><description>The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. This aspect has been overlooked by most of the existing works in this area. In this study, we introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri-Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. We evaluate and compare the performance of our index with several existing indexes, including Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2. We conduct this evaluation on four types of datasets: artificial datasets, real-world datasets, simulated datasets with ranks, and image datasets, using the fuzzy c-means algorithm. Overall, the WP index outperforms most, if not all, of these indexes in terms of accurately detecting the optimal number of clusters and providing accurate secondary options. Moreover, our index remains effective even when the fuzziness parameter $m$ is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI.</description><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7tuwyAYBWCWDlXSB-hUXsAuGDAwRlFvUqRmyG5x-VGRHBMBSeM8fZu005nO0fkQeqSk5UoI8mzyOZ7ajhHVUi6VuEfbFXYpZxhNjWlqrCngcTheLjN247FUyPhkxuhjnXGcPJzxd6xfuIBLkzd5xulwLRbsoYKrKS_RXTBjgYf_XKDd68tu_d5sPt8-1qtNY3opGi61cB1Xvgs9WOZVsMIZoFYxTQKnJATgrCdaCx5sYI5I64zVxFHCtZFsgZ7-Zm-k4ZDj_vfNcKUNNxr7ASguSys</recordid><startdate>20230828</startdate><enddate>20230828</enddate><creator>Wiroonsri, Nathakhun</creator><creator>Preedasawakul, Onthada</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20230828</creationdate><title>A correlation-based fuzzy cluster validity index with secondary options detector</title><author>Wiroonsri, Nathakhun ; Preedasawakul, Onthada</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-4795c248d2f6eb3d8fb5cae1b8390f410ffe43609954fbf3c07bcab90c1049a73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Wiroonsri, Nathakhun</creatorcontrib><creatorcontrib>Preedasawakul, Onthada</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wiroonsri, Nathakhun</au><au>Preedasawakul, Onthada</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A correlation-based fuzzy cluster validity index with secondary options detector</atitle><date>2023-08-28</date><risdate>2023</risdate><abstract>The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. This aspect has been overlooked by most of the existing works in this area. In this study, we introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri-Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. We evaluate and compare the performance of our index with several existing indexes, including Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2. We conduct this evaluation on four types of datasets: artificial datasets, real-world datasets, simulated datasets with ranks, and image datasets, using the fuzzy c-means algorithm. Overall, the WP index outperforms most, if not all, of these indexes in terms of accurately detecting the optimal number of clusters and providing accurate secondary options. Moreover, our index remains effective even when the fuzziness parameter $m$ is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI.</abstract><doi>10.48550/arxiv.2308.14785</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2308.14785
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2308_14785
source	arXiv.org
subjects	Computer Science - Learning Statistics - Machine Learning
title	A correlation-based fuzzy cluster validity index with secondary options detector
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-16T09%3A38%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20correlation-based%20fuzzy%20cluster%20validity%20index%20with%20secondary%20options%20detector&rft.au=Wiroonsri,%20Nathakhun&rft.date=2023-08-28&rft_id=info:doi/10.48550/arxiv.2308.14785&rft_dat=%3Carxiv_GOX%3E2308_14785%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true