TabularFM: An Open Framework For Tabular Foundational Models
Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowle...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Foundational models (FMs), pretrained on extensive datasets using
self-supervised techniques, are capable of learning generalized patterns from
large amounts of data. This reduces the need for extensive labeled datasets for
each new task, saving both time and resources by leveraging the broad knowledge
base established during pretraining. Most research on FMs has primarily focused
on unstructured data, such as text and images, or semi-structured data, like
time-series. However, there has been limited attention to structured data, such
as tabular data, which, despite its prevalence, remains under-studied due to a
lack of clean datasets and insufficient research on the transferability of FMs
for various tabular data tasks. In response to this gap, we introduce a
framework called TabularFM, which incorporates state-of-the-art methods for
developing FMs specifically for tabular data. This includes variations of
neural architectures such as GANs, VAEs, and Transformers. We have curated a
million of tabular datasets and released cleaned versions to facilitate the
development of tabular FMs. We pretrained FMs on this curated data, benchmarked
various learning methods on these datasets, and released the pretrained models
along with leaderboards for future comparative studies. Our fully open-sourced
system provides a comprehensive analysis of the transferability of tabular FMs.
By releasing these datasets, pretrained models, and leaderboards, we aim to
enhance the validity and usability of tabular FMs in the near future. |
---|---|
DOI: | 10.48550/arxiv.2406.09837 |