TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node Features
Tabular machine learning is an important field for industry and science. In this field, table rows are usually treated as independent data samples, but additional information about relations between them is sometimes available and can be used to improve predictive performance. Such information can b...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Tabular machine learning is an important field for industry and science. In
this field, table rows are usually treated as independent data samples, but
additional information about relations between them is sometimes available and
can be used to improve predictive performance. Such information can be
naturally modeled with a graph, thus tabular machine learning may benefit from
graph machine learning methods. However, graph machine learning models are
typically evaluated on datasets with homogeneous node features, which have
little in common with heterogeneous mixtures of numerical and categorical
features present in tabular datasets. Thus, there is a critical difference
between the data used in tabular and graph machine learning studies, which does
not allow one to understand how successfully graph models can be transferred to
tabular data. To bridge this gap, we propose a new benchmark of diverse graphs
with heterogeneous tabular node features and realistic prediction tasks. We use
this benchmark to evaluate a vast set of models, including simple methods
previously overlooked in the literature. Our experiments show that graph neural
networks (GNNs) can indeed often bring gains in predictive performance for
tabular data, but standard tabular models also can be adapted to work with
graph data by using simple feature preprocessing, which sometimes enables them
to compete with and even outperform GNNs. Based on our empirical study, we
provide insights for researchers and practitioners in both tabular and graph
machine learning fields. |
---|---|
DOI: | 10.48550/arxiv.2409.14500 |