Multiparty Data Publishing via Blockchain and Differential Privacy

Data are distributed between different parties. Collecting data from multiple parties for analysis and mining will serve people better. However, it also brings unprecedented privacy threats to the participants. Therefore, safe and reliable data publishing among multiple data owners is an urgent prob...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Security and communication networks 2022-05, Vol.2022, p.1-13
Hauptverfasser: Gu, Zhen, Zhang, Kejia, Zhang, Guoyin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Data are distributed between different parties. Collecting data from multiple parties for analysis and mining will serve people better. However, it also brings unprecedented privacy threats to the participants. Therefore, safe and reliable data publishing among multiple data owners is an urgent problem to be solved. We mainly study the problem of privacy protection in data publishing. For a centralized scenario, we propose the LDA-DP algorithm. First, the within-class mean vectors and the pooled within-class scatter matrix are perturbed by the Gaussian noise. Second, the optimal projection direction vector with differential privacy is obtained by the Fisher criterion. Finally, the low-dimensional projection data of the original data are obtained. For distributed scenarios, we propose the Mul-LDA-DP algorithm based on a blockchain and differential privacy technology. First, the within-class mean vectors and within-class scatter matrices of local data are perturbed by the Gaussian noise and uploaded to the blockchain network. Second, the projection direction vector is calculated in the blockchain network and returned to the data owner. Finally, the data owner uses the projection direction vector to generate low-dimensional projection data of the original data and upload it to the blockchain network for publishing. Furthermore, in a distributed scenario, we propose a correlated noise generation scheme that uses the additivity of the Gaussian distribution to mitigate the effects of noise and can achieve the same noise level as the centralized scenario. We measure the utility of the published data by the SVM misclassification rate. We conduct comparative experiments with similar algorithms on different real data sets. The experimental results show that the data released by the two algorithms can maintain good utility in SVM classification.
ISSN:1939-0114
1939-0122
DOI:10.1155/2022/5612794