Robust outlier detection based on the changing rate of directed density ratio

The task of outlier detection aims at mining abnormal objects that deviate from normal distribution. Traditional unsupervised outlier detection methods can detect most global outliers, but only perform well under relatively single data distribution. Although the methods based on k-nearest neighbors...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2022-11, Vol.207, p.117988, Article 117988
Hauptverfasser: Li, Kangsheng, Gao, Xin, Fu, Shiyuan, Diao, Xinping, Ye, Ping, Xue, Bing, Yu, Jiahao, Huang, Zijian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The task of outlier detection aims at mining abnormal objects that deviate from normal distribution. Traditional unsupervised outlier detection methods can detect most global outliers, but only perform well under relatively single data distribution. Although the methods based on k-nearest neighbors can fit more complex data distribution, they also have the problem of hardly detecting local outliers or the performance easily influenced by data manifolds. At the same time, the outlier detection performance of most methods based on k-nearest neighbors is greatly affected by parameter k. We proposed a robust outlier detection method based on the changing rate of directed density ratio. The local density of samples is calculated by combining kernel density estimation and extended neighbor set which contains k-nearest neighbors and reverse k-nearest neighbors. Then we define the directed density ratio of a sample based on the density ratio and the vector between the sample and its neighbors. The local information can be better estimated by directed density ratio under different local densities and data manifolds. Then, by increasing the size of neighbors, the change of directed density ratio of a sample was calculated and finally summed up as the outlier score. Experiments are carried out on 12 synthetic datasets that simulate different data distributions and 22 public datasets. The experimental results show that compared with several state-of-the-art methods, the proposed method can achieve better outlier detection performance under different data distributions. In addition, the proposed method shows better robustness when parameter k changes in experimental results. •Propose an outlierness measure – directed density ratio – based on nearest neighbors.•Propose an outlier detector named DCROD based on the directed density ratio.•DCROD is robust against the change of the parameter and data distribution.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.117988