Mixture of Weak & Strong Experts on Graphs
Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a lig...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Realistic graphs contain both (1) rich self-features of nodes and (2)
informative structures of neighborhoods, jointly handled by a Graph Neural
Network (GNN) in the typical setup. We propose to decouple the two modalities
by Mixture of weak and strong experts (Mowst), where the weak expert is a
light-weight Multi-layer Perceptron (MLP), and the strong expert is an
off-the-shelf GNN. To adapt the experts' collaboration to different target
nodes, we propose a "confidence" mechanism based on the dispersion of the weak
expert's prediction logits. The strong expert is conditionally activated in the
low-confidence region when either the node's classification relies on
neighborhood information, or the weak expert has low model quality. We reveal
interesting training dynamics by analyzing the influence of the confidence
function on loss: our training algorithm encourages the specialization of each
expert by effectively generating soft splitting of the graph. In addition, our
"confidence" design imposes a desirable bias toward the strong expert to
benefit from GNN's better generalization capability. Mowst is easy to optimize
and achieves strong expressive power, with a computation cost comparable to a
single GNN. Empirically, Mowst on 4 backbone GNN architectures show significant
accuracy improvement on 6 standard node classification benchmarks, including
both homophilous and heterophilous graphs
(https://github.com/facebookresearch/mowst-gnn). |
---|---|
DOI: | 10.48550/arxiv.2311.05185 |