XTab: Cross-table Pretraining for Tabular Transformers
The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. I...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The success of self-supervised learning in computer vision and natural
language processing has motivated pretraining methods on tabular data. However,
most existing tabular self-supervised learning models fail to leverage
information across multiple data tables and cannot generalize to new tables. In
this work, we introduce XTab, a framework for cross-table pretraining of
tabular transformers on datasets from various domains. We address the challenge
of inconsistent column types and quantities among tables by utilizing
independent featurizers and using federated learning to pretrain the shared
component. Tested on 84 tabular prediction tasks from the OpenML-AutoML
Benchmark (AMLB), we show that (1) XTab consistently boosts the
generalizability, learning speed, and performance of multiple tabular
transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior
performance than other state-of-the-art tabular deep learning models on various
tasks such as regression, binary, and multiclass classification. |
---|---|
DOI: | 10.48550/arxiv.2305.06090 |