Contrastive Graph Condensation: Advancing Data Versatility through Self-Supervised Learning
With the increasing computation of training graph neural networks (GNNs) on large-scale graphs, graph condensation (GC) has emerged as a promising solution to synthesize a compact, substitute graph of the large-scale original graph for efficient GNN training. However, existing GC methods predominant...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the increasing computation of training graph neural networks (GNNs) on
large-scale graphs, graph condensation (GC) has emerged as a promising solution
to synthesize a compact, substitute graph of the large-scale original graph for
efficient GNN training. However, existing GC methods predominantly employ
classification as the surrogate task for optimization, thus excessively relying
on node labels and constraining their utility in label-sparsity scenarios. More
critically, this surrogate task tends to overfit class-specific information
within the condensed graph, consequently restricting the generalization
capabilities of GC for other downstream tasks. To address these challenges, we
introduce Contrastive Graph Condensation (CTGC), which adopts a self-supervised
surrogate task to extract critical, causal information from the original graph
and enhance the cross-task generalizability of the condensed graph.
Specifically, CTGC employs a dual-branch framework to disentangle the
generation of the node attributes and graph structures, where a dedicated
structural branch is designed to explicitly encode geometric information
through nodes' positional embeddings. By implementing an alternating
optimization scheme with contrastive loss terms, CTGC promotes the mutual
enhancement of both branches and facilitates high-quality graph generation
through the model inversion technique. Extensive experiments demonstrate that
CTGC excels in handling various downstream tasks with a limited number of
labels, consistently outperforming state-of-the-art GC methods. |
---|---|
DOI: | 10.48550/arxiv.2411.17063 |