Estimation of cost of k–anonymity in the number of dummy records
De-identification is a process to prevent individuals from being identified from original transaction data by processing personal identification information. k -anonymization, which processes data so that at least k users have the same records, is one of the representative methods of de-identificati...
Gespeichert in:
Veröffentlicht in: | Journal of ambient intelligence and humanized computing 2023-12, Vol.14 (12), p.15885-15894 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | De-identification is a process to prevent individuals from being identified from original transaction data by processing personal identification information.
k
-anonymization, which processes data so that at least
k
users have the same records, is one of the representative methods of de-identification. One of the methods of
k
-anonymization is adding dummy records into the data to protect users who have unique histories. For this method, the cost for
k
-anonymization is the difference in the number of records between the original data and the processed data, and it can be calculated only after deciding the parameter
k
and processing data. However, we want to calculate the cost before processing and find the optimal value of
k
because processing the big data with various
k
is very costly. In this paper, we propose a new model of transaction data that gives us a probability distribution and an expected value of values in data under the assumption that all values occur independently with uniform probability. Applying our data model, it is possible to evaluate the cost of
k
-anonymized data even before processing. |
---|---|
ISSN: | 1868-5137 1868-5145 |
DOI: | 10.1007/s12652-021-03369-5 |