A High-Dimensional Outlier Detection Approach Based on Local Coulomb Force

Traditional outlier detections are inadequate for high-dimensional data analysis due to the interference of distance tending to be concentrated ("curse of dimensionality"). Inspired by the Coulomb's law, we propose a new high-dimensional data similarity measure vector, which consists...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2023-06, Vol.35 (6), p.5506-5520
Hauptverfasser: Zhu, Pengyun, Zhang, Chaowei, Li, Xiaofeng, Zhang, Jifu, Qin, Xiao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Traditional outlier detections are inadequate for high-dimensional data analysis due to the interference of distance tending to be concentrated ("curse of dimensionality"). Inspired by the Coulomb's law, we propose a new high-dimensional data similarity measure vector, which consists of outlier Coulomb force and outlier Coulomb resultant force. Outlier Coulomb force not only effectively gauges similarity measures among data objects, but also fully reflects differences among dimensions of data objects by vector projection in each dimension. More importantly, Coulomb resultant force can effectively measure deviations of data objects from a data center, making detection results interpretable. We introduce a new neighborhood outlier factor, which drives the development of a high-dimensional outlier detection algorithm. In our approach, attribute values with a high deviation degree is treated as interpretable information of outlier data. Finally, we implement and evaluate our algorithm using the UCI and synthetic datasets. Our experimental results show that the algorithm effectively alleviates the interference of "Curse of Dimensionality". The findings confirm that high-dimensional outlier data originated by the algorithm are interpretable.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2022.3172167