Mining outlying aspects on numeric data

When we are investigating an object in a data set, which itself may or may not be an outlier, can we identify unusual (i.e., outlying) aspects of the object? In this paper, we identify the novel problem of mining outlying aspects on numeric data . Given a query object o in a multidimensional numeric...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Data mining and knowledge discovery 2015-09, Vol.29 (5), p.1116-1151
Hauptverfasser:	Duan, Lei, Tang, Guanting, Pei, Jian, Bailey, James, Campbell, Akiko, Tang, Changjie
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Chemistry and Earth Sciences Computer Science Data mining Data Mining and Knowledge Discovery Datasets Density Fraud Information Storage and Retrieval Mathematical analysis Mining Physics Query processing Searching Statistics for Engineering Subspaces Texts
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	When we are investigating an object in a data set, which itself may or may not be an outlier, can we identify unusual (i.e., outlying) aspects of the object? In this paper, we identify the novel problem of mining outlying aspects on numeric data . Given a query object o in a multidimensional numeric data set O , in which subspace is o most outlying? Technically, we use the rank of the probability density of an object in a subspace to measure the outlyingness of the object in the subspace. A minimal subspace where the query object is ranked the best is an outlying aspect. Computing the outlying aspects of a query object is far from trivial. A naïve method has to calculate the probability densities of all objects and rank them in every subspace, which is very costly when the dimensionality is high. We systematically develop a heuristic method that is capable of searching data sets with tens of dimensions efficiently. Our empirical study using both real data and synthetic data demonstrates that our method is effective and efficient.
ISSN:	1384-5810 1573-756X
DOI:	10.1007/s10618-014-0398-2