From Sancus to Sancus $$^q$$: staleness and quantization-aware full-graph decentralized training in graph neural networks
Graph neural networks (GNNs) have emerged due to their success at modeling graph data. Yet, it is challenging for GNNs to efficiently scale to large graphs. Thus, distributed GNNs come into play. To avoid communication caused by expensive data movement between workers, we propose S ancus and its adv...
Gespeichert in:
Veröffentlicht in: | The VLDB journal 2025-03, Vol.34 (2), Article 22 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Graph neural networks (GNNs) have emerged due to their success at modeling graph data. Yet, it is challenging for GNNs to efficiently scale to large graphs. Thus, distributed GNNs come into play. To avoid communication caused by expensive data movement between workers, we propose S ancus and its advanced version S ancus "Equation missing" , the staleness and quantization-aware communication-avoiding decentralized GNN system. By introducing a set of novel bounded embedding staleness metrics and adaptively skipping broadcasts, S ancus abstracts decentralized GNN processing as sequential matrix multiplication and uses historical embeddings via cache. To further mitigate the communication volume, S ancus "Equation missing" conducts quantization-aware communication on embeddings to reduce the size of broadcast messages. Theoretically, we show bounded approximation errors of embeddings and gradients with a known fastest convergence guarantee. Empirically, we evaluate S ancus and S ancus "Equation missing" with common GNN models via different system setups on large-scale benchmark datasets. Compared to SOTA works, S ancus "Equation missing" can avoid up to $$86\%$$ 86 % communication with $$3.0\times $$ 3.0 × faster throughput on average without accuracy loss. |
---|---|
ISSN: | 1066-8888 0949-877X |
DOI: | 10.1007/s00778-024-00897-2 |