Anonymizing Big Data Streams Using In-memory Processing: A Novel Model Based on One-time Clustering

Big data privacy preservation is a critical challenge for data mining and data analysis. Existing methods for anonymizing big data streams using k-anonymity algorithms may cause high data loss, low data quality, and identity disclosure. In this paper, we propose a novel model for anonymizing big dat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of signal processing systems 2024-07, Vol.96 (6-7), p.333-356
Hauptverfasser: Shamsinejad, Elham, Banirostam, Touraj, Pedram, Mir Mohsen, Rahmani, Amir Masoud
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Big data privacy preservation is a critical challenge for data mining and data analysis. Existing methods for anonymizing big data streams using k-anonymity algorithms may cause high data loss, low data quality, and identity disclosure. In this paper, we propose a novel model for anonymizing big data streams using in-memory processing. The model uses a Spark framework to parallelize the anonymization process and a one-time clustering algorithm to avoid multiple iterations and allocate the data to optimal clusters. We evaluate the performance and effectiveness of the model using a real-world dataset and compare it with three popular k-anonymity algorithms: CRUE, Mean-Shift, and DBSCAN. The results show that the model has the lowest data loss and the highest data quality for different data sizes and k-values. The model is scalable, robust, adaptable, and flexible. The model can provide better data for data mining and data analysis while protecting data privacy and preventing data disclosure.
ISSN:1939-8018
1939-8115
DOI:10.1007/s11265-024-01920-z