A Communication-Efficient Hierarchical Federated Learning Framework via Shaping Data Distribution at Edge
Federated learning (FL) enables collaborative model training over distributed computing nodes without sharing their privacy-sensitive raw data. However, in FL, iterative exchanges of model updates between distributed nodes and the cloud server can result in significant communication cost, especially...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on networking 2024-06, Vol.32 (3), p.2600-2615 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Federated learning (FL) enables collaborative model training over distributed computing nodes without sharing their privacy-sensitive raw data. However, in FL, iterative exchanges of model updates between distributed nodes and the cloud server can result in significant communication cost, especially when the data distributions at distributed nodes are imbalanced with requiring more rounds of iterations. In this paper, with our in-depth empirical studies, we disclose that extensive cloud aggregations can be avoided without compromising the learning accuracy if frequent aggregations can be enabled at edge network. To this end, we shed light on the hierarchical federated learning (HFL) framework, where a subset of distributed nodes can play as edge aggregators to support edge aggregations. Under the HFL framework, we formulate a communication cost minimization (CCM) problem to minimize the total communication cost required for model learning with a target accuracy by making decisions on edge aggragator selection and node-edge associations. Inspired by our data-driven insights that the potential of HFL lies in the data distribution at edge aggregators, we propose ShapeFL, i.e., SHaping dAta distRibution at Edge, to transform and solve the CCM problem. In ShapeFL, we divide the original problem into two sub-problems to minimize the per-round communication cost and maximize the data distribution diversity of edge aggregator data, respectively, and devise two light-weight algorithms to solve them accordingly. Extensive experiments are carried out based on several opened datasets and real-world network topologies, and the results demonstrate the efficacy of ShapeFL in terms of both learning accuracy and communication efficiency. |
---|---|
ISSN: | 1063-6692 1558-2566 |
DOI: | 10.1109/TNET.2024.3363916 |