Three-Way Ensemble Clustering for Incomplete Data

There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms cannot be used for incomplete data sets directly because objects with missing values need to be prepro...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020, Vol.8, p.91855-91864
Hauptverfasser: Wang, Pingxin, Chen, Xiangjian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms cannot be used for incomplete data sets directly because objects with missing values need to be preprocessed. In this paper, we present a new imputation algorithm for incomplete data and a three-way ensemble clustering algorithm based on the imputation result. In the proposed imputation algorithm, the objects with nonmissing values are firstly clustered by using hard clustering methods. For each missing objects, the mean attribute's value of each cluster are used to fill the missing attribute's value, respectively. Perturbation analysis of cluster centroid is applied to search the optimal imputation. As an application of proposed imputation method, we develop a three-way ensemble clustering algorithm by using the ideas of clustering ensemble and three-way decision. The objects with the same cluster label in different clustering results are assigned the core region of corresponding cluster while the objects with different clustering labels are assigned to the fringe region. Therefore, a three-way clustering is naturally formed. The experimental results on UCI data sets can verify that the algorithm is effective in revealing cluster structures.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.2994380