Recovering the number of clusters in data sets with noise features using feature rescaling factors
In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure...
Gespeichert in:
Veröffentlicht in: | Information sciences 2015-12, Vol.324, p.126-145 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 145 |
---|---|
container_issue | |
container_start_page | 126 |
container_title | Information sciences |
container_volume | 324 |
creator | de Amorim, Renato Cordeiro Hennig, Christian |
description | In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters.
We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set. |
doi_str_mv | 10.1016/j.ins.2015.06.039 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1778019013</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0020025515004715</els_id><sourcerecordid>1778019013</sourcerecordid><originalsourceid>FETCH-LOGICAL-c373t-6d924cf9363d13d877432ba6362a1319f3f5eeea0ecc63e243010917cd85a7843</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-AG85emmdJG3S4knEL1gQRM8hm07dLN1WM-mK_96uu2dPw7y8z8A8jF0KyAUIfb3OQ0-5BFHmoHNQ9RGbicrITMtaHLMZgIQMZFmesjOiNQAURusZW76iH7YYQ__B0wp5P26WGPnQct-NlDASDz1vXHKcMBH_DmnF-yEQ8hZdGiMSH2lHH1Y-Jd51f4nzaYh0zk5a1xFeHOacvT_cv909ZYuXx-e720XmlVEp000tC9_WSqtGqKYyplBy6bTS0gkl6la1JSI6QO-1QlkoEFAL45uqdKYq1Jxd7e9-xuFrREp2E8hj17keh5GsMKYCUYNQU1Xsqz4ORBFb-xnDxsUfK8DufNq1nXzanU8L2k4-J-Zmz-D0wzZgtOQD9h6bENEn2wzhH_oX9H1-oQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1778019013</pqid></control><display><type>article</type><title>Recovering the number of clusters in data sets with noise features using feature rescaling factors</title><source>Elsevier ScienceDirect Journals Complete</source><creator>de Amorim, Renato Cordeiro ; Hennig, Christian</creator><creatorcontrib>de Amorim, Renato Cordeiro ; Hennig, Christian</creatorcontrib><description>In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters.
We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.</description><identifier>ISSN: 0020-0255</identifier><identifier>EISSN: 1872-6291</identifier><identifier>DOI: 10.1016/j.ins.2015.06.039</identifier><language>eng</language><publisher>Elsevier Inc</publisher><subject>Cluster validity index ; Clustering ; Clusters ; Estimating ; Feature re-scaling ; Feature weighting ; Gaussian ; K-Means ; Noise ; Recovering ; Rescaling</subject><ispartof>Information sciences, 2015-12, Vol.324, p.126-145</ispartof><rights>2015 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c373t-6d924cf9363d13d877432ba6362a1319f3f5eeea0ecc63e243010917cd85a7843</citedby><cites>FETCH-LOGICAL-c373t-6d924cf9363d13d877432ba6362a1319f3f5eeea0ecc63e243010917cd85a7843</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0020025515004715$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>de Amorim, Renato Cordeiro</creatorcontrib><creatorcontrib>Hennig, Christian</creatorcontrib><title>Recovering the number of clusters in data sets with noise features using feature rescaling factors</title><title>Information sciences</title><description>In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters.
We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.</description><subject>Cluster validity index</subject><subject>Clustering</subject><subject>Clusters</subject><subject>Estimating</subject><subject>Feature re-scaling</subject><subject>Feature weighting</subject><subject>Gaussian</subject><subject>K-Means</subject><subject>Noise</subject><subject>Recovering</subject><subject>Rescaling</subject><issn>0020-0255</issn><issn>1872-6291</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-AG85emmdJG3S4knEL1gQRM8hm07dLN1WM-mK_96uu2dPw7y8z8A8jF0KyAUIfb3OQ0-5BFHmoHNQ9RGbicrITMtaHLMZgIQMZFmesjOiNQAURusZW76iH7YYQ__B0wp5P26WGPnQct-NlDASDz1vXHKcMBH_DmnF-yEQ8hZdGiMSH2lHH1Y-Jd51f4nzaYh0zk5a1xFeHOacvT_cv909ZYuXx-e720XmlVEp000tC9_WSqtGqKYyplBy6bTS0gkl6la1JSI6QO-1QlkoEFAL45uqdKYq1Jxd7e9-xuFrREp2E8hj17keh5GsMKYCUYNQU1Xsqz4ORBFb-xnDxsUfK8DufNq1nXzanU8L2k4-J-Zmz-D0wzZgtOQD9h6bENEn2wzhH_oX9H1-oQ</recordid><startdate>20151210</startdate><enddate>20151210</enddate><creator>de Amorim, Renato Cordeiro</creator><creator>Hennig, Christian</creator><general>Elsevier Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20151210</creationdate><title>Recovering the number of clusters in data sets with noise features using feature rescaling factors</title><author>de Amorim, Renato Cordeiro ; Hennig, Christian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c373t-6d924cf9363d13d877432ba6362a1319f3f5eeea0ecc63e243010917cd85a7843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Cluster validity index</topic><topic>Clustering</topic><topic>Clusters</topic><topic>Estimating</topic><topic>Feature re-scaling</topic><topic>Feature weighting</topic><topic>Gaussian</topic><topic>K-Means</topic><topic>Noise</topic><topic>Recovering</topic><topic>Rescaling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>de Amorim, Renato Cordeiro</creatorcontrib><creatorcontrib>Hennig, Christian</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Information sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>de Amorim, Renato Cordeiro</au><au>Hennig, Christian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Recovering the number of clusters in data sets with noise features using feature rescaling factors</atitle><jtitle>Information sciences</jtitle><date>2015-12-10</date><risdate>2015</risdate><volume>324</volume><spage>126</spage><epage>145</epage><pages>126-145</pages><issn>0020-0255</issn><eissn>1872-6291</eissn><abstract>In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters.
We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.</abstract><pub>Elsevier Inc</pub><doi>10.1016/j.ins.2015.06.039</doi><tpages>20</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0020-0255 |
ispartof | Information sciences, 2015-12, Vol.324, p.126-145 |
issn | 0020-0255 1872-6291 |
language | eng |
recordid | cdi_proquest_miscellaneous_1778019013 |
source | Elsevier ScienceDirect Journals Complete |
subjects | Cluster validity index Clustering Clusters Estimating Feature re-scaling Feature weighting Gaussian K-Means Noise Recovering Rescaling |
title | Recovering the number of clusters in data sets with noise features using feature rescaling factors |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T22%3A41%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Recovering%20the%20number%20of%20clusters%20in%20data%20sets%20with%20noise%20features%20using%20feature%20rescaling%20factors&rft.jtitle=Information%20sciences&rft.au=de%20Amorim,%20Renato%20Cordeiro&rft.date=2015-12-10&rft.volume=324&rft.spage=126&rft.epage=145&rft.pages=126-145&rft.issn=0020-0255&rft.eissn=1872-6291&rft_id=info:doi/10.1016/j.ins.2015.06.039&rft_dat=%3Cproquest_cross%3E1778019013%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1778019013&rft_id=info:pmid/&rft_els_id=S0020025515004715&rfr_iscdi=true |