Recovering the number of clusters in data sets with noise features using feature rescaling factors

In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information sciences 2015-12, Vol.324, p.126-145
Hauptverfasser: de Amorim, Renato Cordeiro, Hennig, Christian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 145
container_issue
container_start_page 126
container_title Information sciences
container_volume 324
creator de Amorim, Renato Cordeiro
Hennig, Christian
description In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters. We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.
doi_str_mv 10.1016/j.ins.2015.06.039
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1778019013</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0020025515004715</els_id><sourcerecordid>1778019013</sourcerecordid><originalsourceid>FETCH-LOGICAL-c373t-6d924cf9363d13d877432ba6362a1319f3f5eeea0ecc63e243010917cd85a7843</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-AG85emmdJG3S4knEL1gQRM8hm07dLN1WM-mK_96uu2dPw7y8z8A8jF0KyAUIfb3OQ0-5BFHmoHNQ9RGbicrITMtaHLMZgIQMZFmesjOiNQAURusZW76iH7YYQ__B0wp5P26WGPnQct-NlDASDz1vXHKcMBH_DmnF-yEQ8hZdGiMSH2lHH1Y-Jd51f4nzaYh0zk5a1xFeHOacvT_cv909ZYuXx-e720XmlVEp000tC9_WSqtGqKYyplBy6bTS0gkl6la1JSI6QO-1QlkoEFAL45uqdKYq1Jxd7e9-xuFrREp2E8hj17keh5GsMKYCUYNQU1Xsqz4ORBFb-xnDxsUfK8DufNq1nXzanU8L2k4-J-Zmz-D0wzZgtOQD9h6bENEn2wzhH_oX9H1-oQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1778019013</pqid></control><display><type>article</type><title>Recovering the number of clusters in data sets with noise features using feature rescaling factors</title><source>Elsevier ScienceDirect Journals Complete</source><creator>de Amorim, Renato Cordeiro ; Hennig, Christian</creator><creatorcontrib>de Amorim, Renato Cordeiro ; Hennig, Christian</creatorcontrib><description>In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters. We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.</description><identifier>ISSN: 0020-0255</identifier><identifier>EISSN: 1872-6291</identifier><identifier>DOI: 10.1016/j.ins.2015.06.039</identifier><language>eng</language><publisher>Elsevier Inc</publisher><subject>Cluster validity index ; Clustering ; Clusters ; Estimating ; Feature re-scaling ; Feature weighting ; Gaussian ; K-Means ; Noise ; Recovering ; Rescaling</subject><ispartof>Information sciences, 2015-12, Vol.324, p.126-145</ispartof><rights>2015 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c373t-6d924cf9363d13d877432ba6362a1319f3f5eeea0ecc63e243010917cd85a7843</citedby><cites>FETCH-LOGICAL-c373t-6d924cf9363d13d877432ba6362a1319f3f5eeea0ecc63e243010917cd85a7843</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0020025515004715$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>de Amorim, Renato Cordeiro</creatorcontrib><creatorcontrib>Hennig, Christian</creatorcontrib><title>Recovering the number of clusters in data sets with noise features using feature rescaling factors</title><title>Information sciences</title><description>In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters. We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.</description><subject>Cluster validity index</subject><subject>Clustering</subject><subject>Clusters</subject><subject>Estimating</subject><subject>Feature re-scaling</subject><subject>Feature weighting</subject><subject>Gaussian</subject><subject>K-Means</subject><subject>Noise</subject><subject>Recovering</subject><subject>Rescaling</subject><issn>0020-0255</issn><issn>1872-6291</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-AG85emmdJG3S4knEL1gQRM8hm07dLN1WM-mK_96uu2dPw7y8z8A8jF0KyAUIfb3OQ0-5BFHmoHNQ9RGbicrITMtaHLMZgIQMZFmesjOiNQAURusZW76iH7YYQ__B0wp5P26WGPnQct-NlDASDz1vXHKcMBH_DmnF-yEQ8hZdGiMSH2lHH1Y-Jd51f4nzaYh0zk5a1xFeHOacvT_cv909ZYuXx-e720XmlVEp000tC9_WSqtGqKYyplBy6bTS0gkl6la1JSI6QO-1QlkoEFAL45uqdKYq1Jxd7e9-xuFrREp2E8hj17keh5GsMKYCUYNQU1Xsqz4ORBFb-xnDxsUfK8DufNq1nXzanU8L2k4-J-Zmz-D0wzZgtOQD9h6bENEn2wzhH_oX9H1-oQ</recordid><startdate>20151210</startdate><enddate>20151210</enddate><creator>de Amorim, Renato Cordeiro</creator><creator>Hennig, Christian</creator><general>Elsevier Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20151210</creationdate><title>Recovering the number of clusters in data sets with noise features using feature rescaling factors</title><author>de Amorim, Renato Cordeiro ; Hennig, Christian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c373t-6d924cf9363d13d877432ba6362a1319f3f5eeea0ecc63e243010917cd85a7843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Cluster validity index</topic><topic>Clustering</topic><topic>Clusters</topic><topic>Estimating</topic><topic>Feature re-scaling</topic><topic>Feature weighting</topic><topic>Gaussian</topic><topic>K-Means</topic><topic>Noise</topic><topic>Recovering</topic><topic>Rescaling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>de Amorim, Renato Cordeiro</creatorcontrib><creatorcontrib>Hennig, Christian</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Information sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>de Amorim, Renato Cordeiro</au><au>Hennig, Christian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Recovering the number of clusters in data sets with noise features using feature rescaling factors</atitle><jtitle>Information sciences</jtitle><date>2015-12-10</date><risdate>2015</risdate><volume>324</volume><spage>126</spage><epage>145</epage><pages>126-145</pages><issn>0020-0255</issn><eissn>1872-6291</eissn><abstract>In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters. We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.</abstract><pub>Elsevier Inc</pub><doi>10.1016/j.ins.2015.06.039</doi><tpages>20</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0020-0255
ispartof Information sciences, 2015-12, Vol.324, p.126-145
issn 0020-0255
1872-6291
language eng
recordid cdi_proquest_miscellaneous_1778019013
source Elsevier ScienceDirect Journals Complete
subjects Cluster validity index
Clustering
Clusters
Estimating
Feature re-scaling
Feature weighting
Gaussian
K-Means
Noise
Recovering
Rescaling
title Recovering the number of clusters in data sets with noise features using feature rescaling factors
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T22%3A41%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Recovering%20the%20number%20of%20clusters%20in%20data%20sets%20with%20noise%20features%20using%20feature%20rescaling%20factors&rft.jtitle=Information%20sciences&rft.au=de%20Amorim,%20Renato%20Cordeiro&rft.date=2015-12-10&rft.volume=324&rft.spage=126&rft.epage=145&rft.pages=126-145&rft.issn=0020-0255&rft.eissn=1872-6291&rft_id=info:doi/10.1016/j.ins.2015.06.039&rft_dat=%3Cproquest_cross%3E1778019013%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1778019013&rft_id=info:pmid/&rft_els_id=S0020025515004715&rfr_iscdi=true