Efficient Mining of Discriminating Relationships Among Attributes Involving Arithmetic Operations

Contrast patterns describe differences between two or more data sets or data classes; they have been proven to be useful for solving many kinds of problems, such as building accurate classifiers, defining clustering quality measures, and analyzing disease subtypes. This article investigates the mini...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computational intelligence 2016-02, Vol.32 (1), p.102-126
Hauptverfasser: Duan, Lei, Dong, Guozhu, Wang, Xianming, Tang, Changjie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Contrast patterns describe differences between two or more data sets or data classes; they have been proven to be useful for solving many kinds of problems, such as building accurate classifiers, defining clustering quality measures, and analyzing disease subtypes. This article investigates the mining of a new kind of contrast patterns, namely discriminating inter‐attribute functions (DIFs), which represent arithmetic‐expression‐based inter‐attribute relationships that distinguish classes of data. DIFs are an expressive and practical alternative of item‐based contrast patterns and can express discriminating relationships such as “weight/(height)2 is more likely to be ≤25 in one class than in another class.” Besides introducing the DIF mining problem, this article makes theoretical and algorithmic contributions on the problem. We prove that DIF mining is MAX SNP‐hard. Regarding how to efficiently mine DIFs, we present a set of rules to prune the search space of arithmetic expressions by eliminating redundant ones (equivalent to some others). We give two algorithms: one for finding all DIFs satisfying given thresholds and another for finding certain optimal DIFs using genetic computation techniques. The former is useful when the number of attributes is small, whereas the latter is useful when that number is large; both use the redundant arithmetic‐expression pruning rules. A performance study shows that our techniques are effective and efficient for finding DIFs.
ISSN:0824-7935
1467-8640
DOI:10.1111/coin.12052