Distributed L-diversity using spark-based algorithm for large resource description frameworks data

Privacy protection issues for resource description frameworks (RDFs) have emerged over the use of public government open data and the healthcare data of individuals. As these data may include personal information, they must undergo a de-identification process that deletes or replaces parts of the or...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of supercomputing 2021-07, Vol.77 (7), p.7270-7286
Hauptverfasser: Jeon, MinHyuk, Temuujin, Odsuren, Ahn, Jinhyun, Im, Dong-Hyuk
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Privacy protection issues for resource description frameworks (RDFs) have emerged over the use of public government open data and the healthcare data of individuals. As these data may include personal information, they must undergo a de-identification process that deletes or replaces parts of the original data. To enable these protections, a method has been developed to apply k-anonymization to RDF data. However, sensitive RDF information anonymized using k-anonymization is not completely secure and is vulnerable to attacks. In this paper, we propose an l-diversity anatomy de-identification method that can overcome the limitations of k-anonymity and guarantee stronger privacy protection than k-anonymization. Further, as this data anonymization process is computationally time-intensive, we use Spark distributed computing to provide rapid de-identification to enhance its utility. We also propose l-diversity preservation for dynamically evolving RDF datasets. Experimental results show that our proposed distributed l-diversity algorithm processes the data more efficiently than conventional approaches.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-020-03583-6