G-Adapter: Towards Structure-Aware Parameter-Efficient Transfer Learning for Graph Transformer Networks
It has become a popular paradigm to transfer the knowledge of large-scale pre-trained models to various downstream tasks via fine-tuning the entire model parameters. However, with the growth of model scale and the rising number of downstream tasks, this paradigm inevitably meets the challenges in te...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | It has become a popular paradigm to transfer the knowledge of large-scale
pre-trained models to various downstream tasks via fine-tuning the entire model
parameters. However, with the growth of model scale and the rising number of
downstream tasks, this paradigm inevitably meets the challenges in terms of
computation consumption and memory footprint issues. Recently,
Parameter-Efficient Fine-Tuning (PEFT) (e.g., Adapter, LoRA, BitFit) shows a
promising paradigm to alleviate these concerns by updating only a portion of
parameters. Despite these PEFTs having demonstrated satisfactory performance in
natural language processing, it remains under-explored for the question of
whether these techniques could be transferred to graph-based tasks with Graph
Transformer Networks (GTNs). Therefore, in this paper, we fill this gap by
providing extensive benchmarks with traditional PEFTs on a range of graph-based
downstream tasks. Our empirical study shows that it is sub-optimal to directly
transfer existing PEFTs to graph-based tasks due to the issue of feature
distribution shift. To address this issue, we propose a novel structure-aware
PEFT approach, named G-Adapter, which leverages graph convolution operation to
introduce graph structure (e.g., graph adjacent matrix) as an inductive bias to
guide the updating process. Besides, we propose Bregman proximal point
optimization to further alleviate feature distribution shift by preventing the
model from aggressive update. Extensive experiments demonstrate that G-Adapter
obtains the state-of-the-art performance compared to the counterparts on nine
graph benchmark datasets based on two pre-trained GTNs, and delivers tremendous
memory footprint efficiency compared to the conventional paradigm. |
---|---|
DOI: | 10.48550/arxiv.2305.10329 |