Hierarchical Graph Transformer with Adaptive Node Sampling
The Transformer architecture has achieved remarkable success in a number of domains including natural language processing and computer vision. However, when it comes to graph-structured data, transformers have not achieved competitive performance, especially on large graphs. In this paper, we identi...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Transformer architecture has achieved remarkable success in a number of
domains including natural language processing and computer vision. However,
when it comes to graph-structured data, transformers have not achieved
competitive performance, especially on large graphs. In this paper, we identify
the main deficiencies of current graph transformers:(1) Existing node sampling
strategies in Graph Transformers are agnostic to the graph characteristics and
the training process. (2) Most sampling strategies only focus on local
neighbors and neglect the long-range dependencies in the graph. We conduct
experimental investigations on synthetic datasets to show that existing
sampling strategies are sub-optimal. To tackle the aforementioned problems, we
formulate the optimization strategies of node sampling in Graph Transformer as
an adversary bandit problem, where the rewards are related to the attention
weights and can vary in the training procedure. Meanwhile, we propose a
hierarchical attention scheme with graph coarsening to capture the long-range
interactions while reducing computational complexity. Finally, we conduct
extensive experiments on real-world datasets to demonstrate the superiority of
our method over existing graph transformers and popular GNNs. |
---|---|
DOI: | 10.48550/arxiv.2210.03930 |