Self-adaptive kernel K-means algorithm based on the shuffled frog leaping algorithm

Kernel K -means can handle nonlinearly separate datasets by mapping the input datasets into a high-dimensional feature space. The kernel matrix reflects the inner structure of data, so it is a key to construct an appropriate kernel matrix. However, many kernel-based methods need to be set kernel par...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Soft computing (Berlin, Germany) Germany), 2018-02, Vol.22 (3), p.861-872
Hauptverfasser: Fan, Shuyan, Ding, Shifei, Xue, Yu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Kernel K -means can handle nonlinearly separate datasets by mapping the input datasets into a high-dimensional feature space. The kernel matrix reflects the inner structure of data, so it is a key to construct an appropriate kernel matrix. However, many kernel-based methods need to be set kernel parameter artificially in advance. It is difficult to set an appropriate kernel parameter for each dataset artificially, which limits the performance of the kernel K -means algorithm to some extent. It is necessary to design a method which can adjust the kernel parameter automatically according to the data structure. In addition, the number of clusters also needs to be set. To overcome these challenges, this paper proposed a self-adaptive kernel K -means based on the shuffled frog leaping algorithm, which regard the kernel parameter and the number of clusters as the position information of the frog. We designed a clustering validity index named Between-Within Proportion suitable for the kernel space ( KBWP ) by modifying the clustering validity index Between-Within Proportion ( BWP ). Treat KBWP as fitness in the shuffled frog leaping algorithm, and then do local and global optimization until the max iterations. The kernel parameter and the number of clusters corresponding to the maximum fitness are optimal. We experimentally verify our algorithm on artificial datasets and real datasets. Experimental results demonstrate the effectiveness and good performance of the proposed algorithm.
ISSN:1432-7643
1433-7479
DOI:10.1007/s00500-016-2389-2