An adaptive mutual K-nearest neighbors clustering algorithm based on maximizing mutual information
•We propose a new voting method to improve the clustering results of the conventional CMNN algorithm called VCMNN, and it overcomes an important limitation of noise misidentification for CMNN.•We propose a new clustering framework without the need for parameter adjustment based on the theories of co...
Gespeichert in:
Veröffentlicht in: | Pattern recognition 2023-05, Vol.137, p.109273, Article 109273 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •We propose a new voting method to improve the clustering results of the conventional CMNN algorithm called VCMNN, and it overcomes an important limitation of noise misidentification for CMNN.•We propose a new clustering framework without the need for parameter adjustment based on the theories of co-occurrence and mutual information. It is novel to use them to automatically find better parameter values.•We use six synthetic datasets, ten UCI datasets, and four image datasets to evaluate the proposed method, and the experiments show that the proposed VCMNN and AVCMNN outperforms three classical clustering algorithms and six SOTA clustering algorithms in most cases.
Clustering based on Mutual K-nearest Neighbors (CMNN) is a classical method of grouping data into different clusters. However, it has two well-known limitations: (1) the clustering results are very much dependent on the parameter k; (2) CMNN assumes that noise points correspond to clusters of small sizes according to the Mutual K-nearest Neighbors (MKNN) criterion, but some data points in small size clusters are wrongly identified as noises. To address these two issues, we propose an adaptive improved CMNN algorithm (AVCMNN), which consists of two parts: (1) improved CMNN algorithm (abbreviated as VCMNN) and (2) adaptive VCMNN algorithm (abbreviated as AVCMNN). Specifically, the first part is VCMNN algorithm, we first reassign the data points in some small-size clusters by a novel voting strategy because some of them are wrongly identified as noise points, and the clustering results are improved. Then, the second part is AVCMNN, we use maximizing mutual information to construct an objective function to optimize the parameters of the proposed method and finally obtain the better parameters values and clustering results. We conduct extensive experiments on twenty datasets, including six synthetic datasets, ten UCI datasets, and four image datasets. The experimental results show that VCMNN and AVCMNN outperforms three classical algorithms (i.e., CMNN, DPC, and DBSCAN) and six state-of-the-art (SOTA) clustering algorithms in most cases. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2022.109273 |