A parallel anomaly detection method and system based on MapReduce

The invention belongs to the technical field of a device or method suitable for digital computing or data processing of a specific application, and discloses a parallel anomaly detection method and asystem based on MapReduce. The data set stored on a Hadoop distributed file system is randomly divide...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: QI XIAOGANG, HU SHAOLIN, HU QIUQIU, LIU LIFANG, FENG HAILIN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention belongs to the technical field of a device or method suitable for digital computing or data processing of a specific application, and discloses a parallel anomaly detection method and asystem based on MapReduce. The data set stored on a Hadoop distributed file system is randomly divided into a plurality of data blocks according to requirements. A MapReduce framework is used to calculate the local anomaly factors of data points in each block in parallel, and k-Distinct-Neighbor is used to replace k-Nearest-Neighbor; the data points whose LOF value is greater than the set threshold value in each block are combined to recalculate the LOF value. The efficiency of MR-DLOF is obviously better than that of LOF algorithm in dealing with a large amount of data. 本发明属于门适用于特定应用的数字计算或数据处理的设备或方法技术领域,公开了种基于MapReduce的并行异常检测方法及系统,根据需求将存放在Hadoop分布式文件系统上的数据集随机切分为多个数据块;利用MapReduce框架并行计算各块中数据点的局部异常因子,并用k-distinct-neighbor替换k-nearest-neighbor;将各块中LOF值大于设定阈值的数据点合并重新计算其LOF值。MR-DLOF在处理大量数据时的执行效率明显优于LOF算法。