Anomaly detection with correlation laws
Datasets from different domains usually contain data defined over a wide set of attributes among which various degrees of correlation exist. The identification of data objects not complying with these hidden correlations is a formidable task. Moreover, often attributes may play different roles in ap...
Gespeichert in:
Veröffentlicht in: | Data & knowledge engineering 2023-05, Vol.145, p.102181, Article 102181 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Datasets from different domains usually contain data defined over a wide set of attributes among which various degrees of correlation exist. The identification of data objects not complying with these hidden correlations is a formidable task. Moreover, often attributes may play different roles in applications. Specifically, some features can be perceived as independent variables which are responsible for the definition of a context in which a dependent variable exhibits anomalous behaving values. Hence, in this work we focus on the detection of data objects showing an anomalous behavior on a subset of attributes, called behavioral, w.r.t. some other ones, called contextual. As a main contribution, we design a model to describe the correlation laws hidden in data distributions over pairs of behavioral–contextual attributes. We introduce a probability measure aimed at scoring subsequently observed objects based on how much their behavior deviates from the detected correlation laws. We test our method on both synthetic and real dataset to demonstrate its effectiveness and show its ability in outperforming some competitors. Moreover, we discuss a case study in the field of gene expression data analysis to prove that it can provide a valuable contribution when dealing with those scenarios in which the features are much more abundant than the samples available for the analysis. |
---|---|
ISSN: | 0169-023X 1872-6933 |
DOI: | 10.1016/j.datak.2023.102181 |