Federated Learning with complete service commitment of data heterogeneity

Federated Learning (FL) systems grapple with data statistical heterogeneity, primarily manifested as non-iid label distribution skew and quantity skew. Label skew refers to the uneven distribution of labels across clients, while quantity skew pertains to disparities in the amount of data held by eac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2025-02, Vol.310, p.112937, Article 112937
Hauptverfasser: Zhou, Yizhi, Wang, Junxiao, Qin, Yuchen, Kong, Xiangyu, Xie, Xin, Qi, Heng, Zeng, Deze
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Federated Learning (FL) systems grapple with data statistical heterogeneity, primarily manifested as non-iid label distribution skew and quantity skew. Label skew refers to the uneven distribution of labels across clients, while quantity skew pertains to disparities in the amount of data held by each client. Despite significant advancements, existing FL frameworks, many of which are open-source, have predominantly addressed label skew with limited success in managing quantity skew. This paper demonstrates through empirical evidence the incomplete commitment of data heterogeneity, leading to under performance due to unaddressed quantity skew. In response, we propose a novel taxonomy that distinguishes between “heterogeneity without quantity skew” (WQS) and “heterogeneity amplified by quantity skew” (AQS), the latter of which characterizes our complete service commitment. Our findings indicate how quantity skew can lead to a notable decline in model performance, particularly affecting clients with lesser data, and contribute to the redundancy effects in clients with abundant data, where the marginal utility of additional data diminishes. Furthermore, we introduce FedED, a theoretical framework that calculates effective data counts in a model-independent and loss-agnostic manner, integrating these counts into the server’s weighted aggregation process. This methodology, enhanced by an effective-sample-based client sampling strategy, significantly improves model performance by addressing both label and quantity skews concurrently. Extensive experiments validate that our approach outperforms existing methods and integrates seamlessly with current frameworks to elevate FL robustness further, thereby offering a holistic solution to the challenges posed by data heterogeneity in FL systems. •We explored heterogeneity amplified by quantity skew (AQS) in FL, revealing sharp accuracy drops.•We proposed FedED, a novel framework inspired by random coverage to address AQS in FL.•FedED introduces effective sample sizes into weighted aggregation and client sampling strategies.•Experiments show that FedED improves FL models under heterogeneous, skewed conditions.
ISSN:0950-7051
DOI:10.1016/j.knosys.2024.112937