SPMGAE: Self-purified masked graph autoencoders release robust expression power
To tackle the scarcity of labeled graph data, graph self-supervised learning (SSL) has branched into two paradigms: Generative methods and Contrastive methods. Inspired by MAE and BERT in computer vision (CV) and natural language processing (NLP), masked graph autoencoders (MGAEs) are gaining popula...
Gespeichert in:
Veröffentlicht in: | Neurocomputing (Amsterdam) 2025-01, Vol.611, p.128631, Article 128631 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | To tackle the scarcity of labeled graph data, graph self-supervised learning (SSL) has branched into two paradigms: Generative methods and Contrastive methods. Inspired by MAE and BERT in computer vision (CV) and natural language processing (NLP), masked graph autoencoders (MGAEs) are gaining popularity in the generative genre. However, prevailing MGAEs are mostly designed under the assumption that the data has high homophilic score and is out of adversarial distortion. When people deliberately improve the performance on homophilic graph datasets, they ignore a critical issue that both internal heterophily and artificial attack noise are quite common in the real world. Therefore, when data itself is highly heterophilic or confronted with attacks, they merely have no defensive capability. Especially under self-supervised conditions, it is much more difficult to detect internal heterophily and resist artificial attacks. In this paper, we propose a Self-Purified Masked Graph Autoencoder (SPMGAE) to make up for the shortcomings of prevailing MGAEs in terms of robustness. SPMGAE first utilizes a self-purified module to prune raw graph data and separate perturbation information. The purified graph provides a robust graph structure for the entire pre-training process. Next, the encoding module reuses perturbation information for auxiliary training to enhance robustness, while the decoding module reconstructs the effective graph data at a finer granularity. Extensive experiments on homophilic and heterophilic datasets attacked by various attack methods demonstrate SPMGAE has a considerable robust expressive ability. Especially on small datasets with large perturbations, the improvement of defensive performance could reaches 10%–25%.
•We are the first to investigate the vulnerabilities of Mask Graph Autoencoders.•We analyze SOTA MGAEs to understand the reasons for their lack of robustness.•We propose a feasible plug-and-play self-purified scheme for MGAEs.•We propose a flexible robust pre-training network SPMGAE. |
---|---|
ISSN: | 0925-2312 |
DOI: | 10.1016/j.neucom.2024.128631 |