Beyond model splitting: Preventing label inference attacks in vertical federated learning with dispersed training

Federated learning is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data. As an important variant, vertical federated learning (VFL) deals with cases in which collaborating organizations own data of the same set of users but with di...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	World wide web (Bussum) 2023-09, Vol.26 (5), p.2691-2707
Hauptverfasser:	Wang, Yilei, Lv, Qingzhe, Zhang, Huang, Zhao, Minghao, Sun, Yuhong, Ran, Lingkai, Li, Tao
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer centers Computer Science Database Management Datasets Dispersion Federated learning Inference Information sharing Information Systems Applications (incl.Internet) Labels Machine learning Operating Systems Organizations Privacy Security Special Issue on Privacy and Security in Machine Learning Splitting World Wide Web
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Federated learning is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data. As an important variant, vertical federated learning (VFL) deals with cases in which collaborating organizations own data of the same set of users but with disjoint features. It is generally regarded that VFL is more secure than horizontal federated learning. However, recent research (USENIX Security’22) reveals that it is still possible to conduct label inference attacks in VFL, in which attacker can acquire privately owned labels of other participants; even VFL constructed with model splitting (the kind of VFL architecture with higher security guarantee) cannot escape it. To solve this issue, in this paper, we propose the dispersed training framework. It utilizes secret sharing to break the correlations between the bottom model and the training data. Accordingly, even if the attacker receives the gradients in the training phase, he is incapable to deduce the feature representation of labels from the bottom model. Besides, we design a customized model aggregation method such that the shared model can be privately combined, and the linearity of secret sharing schemes ensures the training accuracy to be preserved. Theoretical and experimental analyses indicate the satisfactory performance and effectiveness of our framework.
ISSN:	1386-145X 1573-1413
DOI:	10.1007/s11280-023-01159-x