Efficient discovery of contrast subspaces for object explanation and characterization

We tackle the novel problem of mining contrast subspaces. Given a set of multidimensional objects in two classes C + and C - and a query object o , we want to find the top- k subspaces that maximize the ratio of likelihood of o in C + against that in C - . Such subspaces are very useful for characte...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge and information systems 2016-04, Vol.47 (1), p.99-129
Hauptverfasser:	Duan, Lei, Tang, Guanting, Pei, Jian, Bailey, James, Dong, Guozhu, Nguyen, Vinh, Campbell, Akiko, Tang, Changjie
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis Cardiovascular disease Computer Science Coronary vessels Data mining Data Mining and Knowledge Discovery Database Management Datasets Density Information Storage and Retrieval Information systems Information Systems and Communication Service Information Systems Applications (incl.Internet) IT in Business Kernels Methods Mining Pruning Regular Paper Studies Subspaces Texts Vein & artery diseases
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We tackle the novel problem of mining contrast subspaces. Given a set of multidimensional objects in two classes C + and C - and a query object o , we want to find the top- k subspaces that maximize the ratio of likelihood of o in C + against that in C - . Such subspaces are very useful for characterizing an object and explaining how it differs between two classes. We demonstrate that this problem has important applications, and, at the same time, is very challenging, being MAX SNP-hard. We present CSMiner, a mining method that uses kernel density estimation in conjunction with various pruning techniques. We experimentally investigate the performance of CSMiner on a range of data sets, evaluating its efficiency, effectiveness, and stability and demonstrating it is substantially faster than a baseline method.
ISSN:	0219-1377 0219-3116
DOI:	10.1007/s10115-015-0835-6