Ensemble based distributed soft clustering

Due to the explosion in the number of autonomous data sources, there is a growing need for effective approaches for distributed knowledge discovery and data mining. The distributed clustering algorithm is used to cluster the distributed datasets without necessarily downloading all the data to a sing...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Visalakshi, N.K., Thangavel, K.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Due to the explosion in the number of autonomous data sources, there is a growing need for effective approaches for distributed knowledge discovery and data mining. The distributed clustering algorithm is used to cluster the distributed datasets without necessarily downloading all the data to a single site. Many applications can benefit from soft clustering, where each object is assigned to multiple clusters with membership weight that sum to one. In this paper, a novel distributed soft clustering algorithm based on ensemble learning is proposed by modifying the existing distributed K-Means algorithm, to attain high quality soft clusters. The proposed algorithm is able to cluster multiple homogeneous data sources, distributed over several local sites by combining local clustering results. The fuzzy C-Means algorithm is used to cluster local datasets and the centroids of individual datasets form an ensemble. The global centroid is obtained by clustering local centroids using K-Means algorithm with global K value at central place. The local soft clusters are updated using global centroid. The experiments are carried out for various datasets of UCI machine learning data repository to compare the performance the proposed algorithm with conventional centralized fuzzy C-Means clustering algorithm.
DOI:10.1109/ICCCNET.2008.4787679