Discovering knowledge from data clustering using automatically-defined interval type-2 fuzzy predicates
• It is proposed a new clustering method based on interval type-2 fuzzy predicates.• Fuzzy predicates are automatically generated from data describing clusters.• Interval type-2 membership functions model variability and vagueness in clusters.• Linguistic descriptions and knowledge are extracted fro...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2017-02, Vol.68, p.136-150 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | • It is proposed a new clustering method based on interval type-2 fuzzy predicates.• Fuzzy predicates are automatically generated from data describing clusters.• Interval type-2 membership functions model variability and vagueness in clusters.• Linguistic descriptions and knowledge are extracted from predicates.• The method can be applied to data analysis applications.
In data clustering fuzzy predicates act as cluster descriptors providing linguistically expressed knowledge which indicates how features are related to each cluster. Fuzzy predicates directly and automatically obtained from data enable discovering knowledge inside clusters, even when there is no prior-information about the clustering problem. In this work a new method for automatic discovering of interval type-2 fuzzy predicates in data clustering is proposed, called Type-2 Data-based Fuzzy Predicate Clustering (T2-DFPC). In a first stage, a data analysis is performed by making a random partition of the original data and running a clustering scheme that automatically determines the suitable number of clusters. From this stage, interval type-2 fuzzy predicates are discovered. Results obtained on very different clustering datasets show that the T2-DFPC method was consistently one of the best in terms of accuracy. The method preserves all known advantages of the interval type-2 FL to deal with problems with vagueness, quantifying the degree of truth of the fuzzy predicates and modelling the variability of the data inside the clusters. The proposed method is a fast, useful, general, and unsupervised approach for interpretable data clustering, being the knowledge-extracting capabilities one of the main contributions. Linguistic expressions can be easily adapted to match the terminology used in the field the data are related to. The predicates are able to generalize the knowledge for new cases (new data), as an intelligent system. This new approach might be surprisingly useful in contexts where, besides the clustering partition, summary information from data is of interest. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2016.10.018 |