Utility-Aware Optimal Data Selection for Differentially Private Federated Learning in IoV

Federated learning coordinates distributed data sets to train models, which brings the significant impact of data selection on model performance. Personalized differential privacy, however, introduces heterogeneity into the vehicular data sets: the higher privacy protection may reduce the contributi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE internet of things journal 2024-10, Vol.11 (20), p.33326-33336
Hauptverfasser:	Zhang, Jiancong, Li, Shining, Wang, Changhao
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive sampling Algorithms Convergence Cost function Data models Datasets Design optimization differential privacy Federated learning Heterogeneity Internet of Vehicles Noise noisy gradient descent (NGD) Optimization Privacy Sensitivity Training utility evaluation
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Federated learning coordinates distributed data sets to train models, which brings the significant impact of data selection on model performance. Personalized differential privacy, however, introduces heterogeneity into the vehicular data sets: the higher privacy protection may reduce the contribution of local models to model convergence. Therefore, the goal of this article is to dynamically optimize the combination of data sets to tackle the heterogeneity in differential private federated learning in Internet of Vehicles. This is extremely challenging without direct data access and a visible training process. Therefore, we propose an efficient hierarchical data selection method. First, the utility is evaluated using the convergence bound derived from the noise function and the cost function. Accordingly, a collection of high-value clients is selected to maximize the potential contribution of the combination to the global model. Then, we design an optimization function based on the unknown variables within the convergence bound and develop a low-complexity algorithm to approximate the sampling probability. Meanwhile, the aggregation weight of each model is adjusted to ensure unbiased estimation. Experimental results on two real-world trajectory data sets show that the scheme can reduce the meter error by 8.90% and 15.97%, respectively, and improve the convergence speed by 23.9% and 27.1%, respectively.
ISSN:	2327-4662 2327-4662
DOI:	10.1109/JIOT.2024.3427132