SHINE: A Scalable Heterogeneous Inductive Graph Neural Network for Large Imbalanced Datasets

Research interest in machine learning (ML) for graphs has skyrocketed in recent years. However, non-euclidean graph structures inhibit the application of traditional ML algorithms. Consequently, scholars introduced graph learning algorithms tailored to network data, such as graph neural networks (GN...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2024-09, Vol.36 (9), p.4904-4915
Hauptverfasser:	Van Belle, Rafael, De Weerdt, Jochen
Format:	Artikel
Sprache:	eng
Schlagworte:	Class imbalance Classification algorithms Fraud fraud detection graph neural network (GNN) Graph neural networks heterogeneous graph Heuristic algorithms Image edge detection inductive node classification Self-supervised learning Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Research interest in machine learning (ML) for graphs has skyrocketed in recent years. However, non-euclidean graph structures inhibit the application of traditional ML algorithms. Consequently, scholars introduced graph learning algorithms tailored to network data, such as graph neural networks (GNNs). Most GNNs are designed for homogeneous and homophilous graphs and are evaluated on small, static, and balanced datasets, deviating from real-world conditions and industry applications. This paper introduces SHINE, a scalable heterogeneous inductive GNN for large imbalanced datasets. SHINE addresses four key challenges: scalability, network heterogeneity, inductive learning on dynamic graphs, and imbalanced node classification. SHINE comprises three core components: 1) a sampler based on nearest-neighbor (NN) search, 2) a heterogeneous GNN (HGNN) layer with a novel relationship aggregator, and 3) aggregator functions tailored to skewed class distributions. The components of SHINE are evaluated on benchmark datasets, while the integrated benefits of SHINE are demonstrated on two fraud detection datasets.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2024.3381240