Evaluating Prediction Error for Anomaly Detection by Exploiting Matrix Factorization in Rating Systems

A rating system provides rating data about products or services, which is a key feature of e-commerce websites such as Amazon, TripAdvisor, and so on. In reality, rating systems generally suffer from threats of profile injection attacks or anomalous ratings due to the integration of collaborative re...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2018-01, Vol.6, p.50014-50029
Hauptverfasser: Yang, Zhihai, Sun, Qindong, Zhang, Beibei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A rating system provides rating data about products or services, which is a key feature of e-commerce websites such as Amazon, TripAdvisor, and so on. In reality, rating systems generally suffer from threats of profile injection attacks or anomalous ratings due to the integration of collaborative recommendation techniques. To reduce these risks, a number of detection methods have been developed for defending such potential threats. However, they either directly calculate similarity between users or items to recognize attack profiles and genuine profiles, or utilize supervised learning methods by extracting features from user profiles for anomaly detection. In this paper, we propose a stepwise detection method to spot anomalous ratings or attacks, which bypasses the hard problems of similarity calculation and feature extraction. First, a part of samples are randomly selected from original user profiles for constructing a sub-matrix (rating matrix). A fast max-margin matrix factorization is then employed to make rating prediction. After that, suspected items can be captured by comprehensively analyzing both the distributions of mean prediction errors of items and users. Finally, anomalous ratings and potential attackers can be directly returned. Extensive experiments on MovieLens-100K data set demonstrate the effectiveness of the proposed approach compared with benchmarked methods. It is noteworthy that suspected items on a large-scale real-world data set, Amazon data, are detected by the proposed method and further analyzed from diverse perspectives, including rating distribution, rating intention, and time series analysis.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2018.2869271