Unsupervised Coupled Metric Similarity for Non-IID Categorical Data

Appropriate similarity measures always play a critical role in data analytics, learning, and processing. Measuring the intrinsic similarity of categorical data for unsupervised learning has not been substantially addressed, and even less effort has been made for the similarity analysis of categorica...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2018-09, Vol.30 (9), p.1810-1823
Hauptverfasser: Jian, Songlei, Cao, Longbing, Lu, Kai, Gao, Hang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Appropriate similarity measures always play a critical role in data analytics, learning, and processing. Measuring the intrinsic similarity of categorical data for unsupervised learning has not been substantially addressed, and even less effort has been made for the similarity analysis of categorical data that is not independent and identically distributed (non-IID). In this work, a Coupled Metric Similarity (CMS) is defined for unsupervised learning which flexibly captures the value-to-attribute-to-object heterogeneous coupling relationships. CMS learns the similarities in terms of intrinsic heterogeneous intra- and inter-attribute couplings and attribute-to-object couplings in categorical data. The CMS validity is guaranteed by satisfying metric properties and conditions, and CMS can flexibly adapt to IID to non-IID data. CMS is incorporated into spectral clustering and k-modes clustering and compared with relevant state-of-the-art similarity measures that are not necessarily metrics. The experimental results and theoretical analysis show the CMS effectiveness of capturing independent and coupled data characteristics, which significantly outperforms other similarity measures on most datasets.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2018.2808532