Anonymizing Big Data Streams Using In-memory Processing: A Novel Model Based on One-time Clustering

Big data privacy preservation is a critical challenge for data mining and data analysis. Existing methods for anonymizing big data streams using k-anonymity algorithms may cause high data loss, low data quality, and identity disclosure. In this paper, we propose a novel model for anonymizing big dat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of signal processing systems 2024-07, Vol.96 (6-7), p.333-356
Hauptverfasser:	Shamsinejad, Elham, Banirostam, Touraj, Pedram, Mir Mohsen, Rahmani, Amir Masoud
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Big Data Circuits and Systems Clustering Computer Imaging Data analysis Data integrity Data loss Data mining Data transmission Electrical Engineering Engineering Image Processing and Computer Vision Pattern Recognition Pattern Recognition and Graphics Privacy Signal,Image and Speech Processing Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Big data privacy preservation is a critical challenge for data mining and data analysis. Existing methods for anonymizing big data streams using k-anonymity algorithms may cause high data loss, low data quality, and identity disclosure. In this paper, we propose a novel model for anonymizing big data streams using in-memory processing. The model uses a Spark framework to parallelize the anonymization process and a one-time clustering algorithm to avoid multiple iterations and allocate the data to optimal clusters. We evaluate the performance and effectiveness of the model using a real-world dataset and compare it with three popular k-anonymity algorithms: CRUE, Mean-Shift, and DBSCAN. The results show that the model has the lowest data loss and the highest data quality for different data sizes and k-values. The model is scalable, robust, adaptable, and flexible. The model can provide better data for data mining and data analysis while protecting data privacy and preventing data disclosure.
ISSN:	1939-8018 1939-8115
DOI:	10.1007/s11265-024-01920-z