Self-consistent Graph Neural Networks for Semi-supervised Node Classification
Graph Neural Networks (GNNs), the powerful graph representation technique based on deep learning, have attracted great research interest in recent years. Although many GNNs have achieved the state-of-the-art accuracy on a set of standard benchmark datasets, they are still limited to traditional semi...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on big data 2023-08, Vol.9 (4), p.1-13 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Graph Neural Networks (GNNs), the powerful graph representation technique based on deep learning, have attracted great research interest in recent years. Although many GNNs have achieved the state-of-the-art accuracy on a set of standard benchmark datasets, they are still limited to traditional semi-supervised framework and lack of sufficient supervision information, especially for the large amount of unlabeled data. To overcome this issue, we propose a novel self-consistent graph neural networks (SCGNN) framework to enrich the supervision information from two aspects: the self-consistency of unlabeled data and the label information of labeled data. First, in order to extract the self-supervision information from the numerous unlabeled nodes, we perform graph data augmentation and leverage a self-consistent constraint to maximize the mutual information of the unlabeled nodes across different augmented graph views. The self-consistency can sufficiently utilize the intrinsic structural attributes of the graph to extract the self-supervision information from unlabeled data and improve the subsequent classification result. Second, to further extract supervision information from scarce labeled nodes, we introduce a fusion mechanism to obtain comprehensive node embeddings by fusing node representations of two positive graph views, and optimize the classification loss over labeled nodes to maximize the utilization of label information. We conduct comprehensive empirical studies on six public benchmark datasets in node classification task. In terms of accuracy, SCGNN improves by an average of 2.08% over the best baseline, and specifically by 5.8% on the Disease dataset. |
---|---|
ISSN: | 2332-7790 2372-2096 |
DOI: | 10.1109/TBDATA.2023.3266590 |