Space Structure and Clustering of Categorical Data

Learning from categorical data plays a fundamental role in such areas as pattern recognition, machine learning, data mining, and knowledge discovery. To effectively discover the group structure inherent in a set of categorical objects, many categorical clustering algorithms have been developed in th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2016-10, Vol.27 (10), p.2047-2059
Hauptverfasser: Qian, Yuhua, Li, Feijiang, Liang, Jiye, Liu, Bing, Dang, Chuangyin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Learning from categorical data plays a fundamental role in such areas as pattern recognition, machine learning, data mining, and knowledge discovery. To effectively discover the group structure inherent in a set of categorical objects, many categorical clustering algorithms have been developed in the literature, among which k-modes-type algorithms are very representative because of their good performance. Nevertheless, there is still much room for improving their clustering performance in comparison with the clustering algorithms for the numeric data. This may arise from the fact that the categorical data lack a clear space structure as that of the numeric data. To address this issue, we propose, in this paper, a novel data-representation scheme for the categorical data, which maps a set of categorical objects into a Euclidean space. Based on the data-representation scheme, a general framework for space structure based categorical clustering algorithms (SBC) is designed. This framework together with the applications of two kinds of dissimilarities leads two versions of the SBC-type algorithms. To verify the performance of the SBC-type algorithms, we employ as references four representative algorithms of the k-modes-type algorithms. Experiments show that the proposed SBC-type algorithms significantly outperform the k-modes-type algorithms.
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2015.2451151