Sparse probabilistic K-means
•Page 7: The statement and the proof of Theorem 3.2 are modified.•Page 9: The statement and the proof of Theorem 3.2 are modified.•Page 21: Table 2 is added and the comments for Table 2 is in the second paragraph. The goal of clustering is to partition a set of data points into groups of similar dat...
Gespeichert in:
Veröffentlicht in: | Applied mathematics and computation 2020-10, Vol.382, p.125328, Article 125328 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •Page 7: The statement and the proof of Theorem 3.2 are modified.•Page 9: The statement and the proof of Theorem 3.2 are modified.•Page 21: Table 2 is added and the comments for Table 2 is in the second paragraph.
The goal of clustering is to partition a set of data points into groups of similar data points, called clusters. Clustering algorithms can be classified into two categories: hard and soft clustering. Hard clustering assigns each data point to one cluster exclusively. On the other hand, soft clustering allows probabilistic assignments to clusters. In this paper, we propose a new model which combines the benefits of these two models: clarity of hard clustering and probabilistic assignments of soft clustering. Since the majority of data usually have a clear association, only a few points may require a probabilistic interpretation. Thus, we apply the ℓ1 norm constraint to impose sparsity on probabilistic assignments. Moreover, we also incorporate outlier detection in our clustering model to simultaneously detect outliers which can cause serious problems in statistical analyses. To optimize the model, we introduce an alternating minimization method and prove its convergence. Numerical experiments and comparisons with existing models show the soundness and effectiveness of the proposed model. |
---|---|
ISSN: | 0096-3003 1873-5649 |
DOI: | 10.1016/j.amc.2020.125328 |