Evolutionary k-means for distributed data sets
One of the challenges for clustering resides in dealing with data distributed in separated repositories, because most clustering techniques require the data to be centralized. One of them, k-means, has been elected as one of the most influential data mining algorithms for being simple, scalable and...
Gespeichert in:
Veröffentlicht in: | Neurocomputing (Amsterdam) 2014-03, Vol.127, p.30-42 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 42 |
---|---|
container_issue | |
container_start_page | 30 |
container_title | Neurocomputing (Amsterdam) |
container_volume | 127 |
creator | Naldi, M.C. Campello, R.J.G.B. |
description | One of the challenges for clustering resides in dealing with data distributed in separated repositories, because most clustering techniques require the data to be centralized. One of them, k-means, has been elected as one of the most influential data mining algorithms for being simple, scalable and easily modifiable to a variety of contexts and application domains. Although distributed versions of k-means have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires the number of clusters to be specified in advance. In this paper, we propose the use of evolutionary algorithms to overcome the k-means limitations and, at the same time, to deal with distributed data. Two different distribution approaches are adopted: the first obtains a final model identical to the centralized version of the clustering algorithm; the second generates and selects clusters for each distributed data subset and combines them afterwards. The algorithms are compared experimentally from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from a collection of experiments and statistical tests. The obtained results indicate which variant is more adequate for each application scenario. |
doi_str_mv | 10.1016/j.neucom.2013.05.046 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1793285213</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0925231213007674</els_id><sourcerecordid>1793285213</sourcerecordid><originalsourceid>FETCH-LOGICAL-c369t-a217dfcebd355d4f501e3498ac525c6958eac28929b5b93c915a314aa744c82e3</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOI7-AxfdCG5a82yTjSDD-IABN7oOaXILGdtmTNIB_70dZnDp6m6-cw_nQ-iW4IpgUj9sqxEmG4aKYsIqLCrM6zO0ILKhpaSyPkcLrKgoKSP0El2ltMWYNISqBarW-9BP2YfRxJ_iqxzAjKnoQiycTzn6dsrgCmeyKRLkdI0uOtMnuDndJfp8Xn-sXsvN-8vb6mlTWlarXBpKGtdZaB0TwvFOYAKMK2msoMLWSkgwlkpFVStaxawiwjDCjWk4t5ICW6L7499dDN8TpKwHnyz0vRkhTEmTRjEqBSVsRvkRtTGkFKHTu-iHeY0mWB_06K0-6tEHPRoLPeuZY3enBpOs6btoRuvTX5bO3jgjeOYejxzMc_ceok7Ww2jB-Qg2axf8_0W_OgF8Dg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1793285213</pqid></control><display><type>article</type><title>Evolutionary k-means for distributed data sets</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Naldi, M.C. ; Campello, R.J.G.B.</creator><creatorcontrib>Naldi, M.C. ; Campello, R.J.G.B.</creatorcontrib><description>One of the challenges for clustering resides in dealing with data distributed in separated repositories, because most clustering techniques require the data to be centralized. One of them, k-means, has been elected as one of the most influential data mining algorithms for being simple, scalable and easily modifiable to a variety of contexts and application domains. Although distributed versions of k-means have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires the number of clusters to be specified in advance. In this paper, we propose the use of evolutionary algorithms to overcome the k-means limitations and, at the same time, to deal with distributed data. Two different distribution approaches are adopted: the first obtains a final model identical to the centralized version of the clustering algorithm; the second generates and selects clusters for each distributed data subset and combines them afterwards. The algorithms are compared experimentally from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from a collection of experiments and statistical tests. The obtained results indicate which variant is more adequate for each application scenario.</description><identifier>ISSN: 0925-2312</identifier><identifier>EISSN: 1872-8286</identifier><identifier>DOI: 10.1016/j.neucom.2013.05.046</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Applied sciences ; Asymptotic properties ; Clustering ; Clusters ; Collection ; Computer science; control theory; systems ; Data processing. List processing. Character string processing ; Dealing ; Distributed clustering ; Distributed data mining ; Evolutionary k-means ; Exact sciences and technology ; Memory organisation. Data processing ; Software ; Statistical tests</subject><ispartof>Neurocomputing (Amsterdam), 2014-03, Vol.127, p.30-42</ispartof><rights>2013 Elsevier B.V.</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c369t-a217dfcebd355d4f501e3498ac525c6958eac28929b5b93c915a314aa744c82e3</citedby><cites>FETCH-LOGICAL-c369t-a217dfcebd355d4f501e3498ac525c6958eac28929b5b93c915a314aa744c82e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.neucom.2013.05.046$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>309,310,314,780,784,789,790,3550,23930,23931,25140,27924,27925,45995</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=28284310$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Naldi, M.C.</creatorcontrib><creatorcontrib>Campello, R.J.G.B.</creatorcontrib><title>Evolutionary k-means for distributed data sets</title><title>Neurocomputing (Amsterdam)</title><description>One of the challenges for clustering resides in dealing with data distributed in separated repositories, because most clustering techniques require the data to be centralized. One of them, k-means, has been elected as one of the most influential data mining algorithms for being simple, scalable and easily modifiable to a variety of contexts and application domains. Although distributed versions of k-means have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires the number of clusters to be specified in advance. In this paper, we propose the use of evolutionary algorithms to overcome the k-means limitations and, at the same time, to deal with distributed data. Two different distribution approaches are adopted: the first obtains a final model identical to the centralized version of the clustering algorithm; the second generates and selects clusters for each distributed data subset and combines them afterwards. The algorithms are compared experimentally from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from a collection of experiments and statistical tests. The obtained results indicate which variant is more adequate for each application scenario.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Asymptotic properties</subject><subject>Clustering</subject><subject>Clusters</subject><subject>Collection</subject><subject>Computer science; control theory; systems</subject><subject>Data processing. List processing. Character string processing</subject><subject>Dealing</subject><subject>Distributed clustering</subject><subject>Distributed data mining</subject><subject>Evolutionary k-means</subject><subject>Exact sciences and technology</subject><subject>Memory organisation. Data processing</subject><subject>Software</subject><subject>Statistical tests</subject><issn>0925-2312</issn><issn>1872-8286</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAUhYMoOI7-AxfdCG5a82yTjSDD-IABN7oOaXILGdtmTNIB_70dZnDp6m6-cw_nQ-iW4IpgUj9sqxEmG4aKYsIqLCrM6zO0ILKhpaSyPkcLrKgoKSP0El2ltMWYNISqBarW-9BP2YfRxJ_iqxzAjKnoQiycTzn6dsrgCmeyKRLkdI0uOtMnuDndJfp8Xn-sXsvN-8vb6mlTWlarXBpKGtdZaB0TwvFOYAKMK2msoMLWSkgwlkpFVStaxawiwjDCjWk4t5ICW6L7499dDN8TpKwHnyz0vRkhTEmTRjEqBSVsRvkRtTGkFKHTu-iHeY0mWB_06K0-6tEHPRoLPeuZY3enBpOs6btoRuvTX5bO3jgjeOYejxzMc_ceok7Ww2jB-Qg2axf8_0W_OgF8Dg</recordid><startdate>20140315</startdate><enddate>20140315</enddate><creator>Naldi, M.C.</creator><creator>Campello, R.J.G.B.</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20140315</creationdate><title>Evolutionary k-means for distributed data sets</title><author>Naldi, M.C. ; Campello, R.J.G.B.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c369t-a217dfcebd355d4f501e3498ac525c6958eac28929b5b93c915a314aa744c82e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Asymptotic properties</topic><topic>Clustering</topic><topic>Clusters</topic><topic>Collection</topic><topic>Computer science; control theory; systems</topic><topic>Data processing. List processing. Character string processing</topic><topic>Dealing</topic><topic>Distributed clustering</topic><topic>Distributed data mining</topic><topic>Evolutionary k-means</topic><topic>Exact sciences and technology</topic><topic>Memory organisation. Data processing</topic><topic>Software</topic><topic>Statistical tests</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Naldi, M.C.</creatorcontrib><creatorcontrib>Campello, R.J.G.B.</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Neurocomputing (Amsterdam)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Naldi, M.C.</au><au>Campello, R.J.G.B.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evolutionary k-means for distributed data sets</atitle><jtitle>Neurocomputing (Amsterdam)</jtitle><date>2014-03-15</date><risdate>2014</risdate><volume>127</volume><spage>30</spage><epage>42</epage><pages>30-42</pages><issn>0925-2312</issn><eissn>1872-8286</eissn><abstract>One of the challenges for clustering resides in dealing with data distributed in separated repositories, because most clustering techniques require the data to be centralized. One of them, k-means, has been elected as one of the most influential data mining algorithms for being simple, scalable and easily modifiable to a variety of contexts and application domains. Although distributed versions of k-means have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires the number of clusters to be specified in advance. In this paper, we propose the use of evolutionary algorithms to overcome the k-means limitations and, at the same time, to deal with distributed data. Two different distribution approaches are adopted: the first obtains a final model identical to the centralized version of the clustering algorithm; the second generates and selects clusters for each distributed data subset and combines them afterwards. The algorithms are compared experimentally from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from a collection of experiments and statistical tests. The obtained results indicate which variant is more adequate for each application scenario.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.neucom.2013.05.046</doi><tpages>13</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0925-2312 |
ispartof | Neurocomputing (Amsterdam), 2014-03, Vol.127, p.30-42 |
issn | 0925-2312 1872-8286 |
language | eng |
recordid | cdi_proquest_miscellaneous_1793285213 |
source | Elsevier ScienceDirect Journals Complete |
subjects | Algorithms Applied sciences Asymptotic properties Clustering Clusters Collection Computer science control theory systems Data processing. List processing. Character string processing Dealing Distributed clustering Distributed data mining Evolutionary k-means Exact sciences and technology Memory organisation. Data processing Software Statistical tests |
title | Evolutionary k-means for distributed data sets |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T01%3A00%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evolutionary%20k-means%20for%20distributed%20data%20sets&rft.jtitle=Neurocomputing%20(Amsterdam)&rft.au=Naldi,%20M.C.&rft.date=2014-03-15&rft.volume=127&rft.spage=30&rft.epage=42&rft.pages=30-42&rft.issn=0925-2312&rft.eissn=1872-8286&rft_id=info:doi/10.1016/j.neucom.2013.05.046&rft_dat=%3Cproquest_cross%3E1793285213%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1793285213&rft_id=info:pmid/&rft_els_id=S0925231213007674&rfr_iscdi=true |