XTab: Cross-table Pretraining for Tabular Transformers

The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. I...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhu, Bingzhao, Shi, Xingjian, Erickson, Nick, Li, Mu, Karypis, George, Shoaran, Mahsa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zhu, Bingzhao
Shi, Xingjian
Erickson, Nick
Li, Mu
Karypis, George
Shoaran, Mahsa
description The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.
doi_str_mv 10.48550/arxiv.2305.06090
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2305_06090</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2305_06090</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-ad98d64fcd7456ca3744a9e6d199bf6fc17a231e608cc42337c0f1dc0e309ad93</originalsourceid><addsrcrecordid>eNotj8tqwzAURLXJorj5gK7qH7B7ZcmS1V0xSVsIJAsvsjPXehSDH-EqCenf1027OjDMDDOMPXHIZVWW8IJ06695IaDMQYGBB6aODXavaU1zjNkZu8GnB_Jnwn7qp680zJQuhsuACwmnuAijp_jIVgGH6Nf_TFiz3TT1R7bbv3_Wb7sMlYYMnamcksE6LUtlUWgp0XjluDFdUMFyjYXgXkFlrSyE0BYCdxa8ALOERcKe_2rvw9sT9SPSd_t7oL0fED_oJ0A3</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>XTab: Cross-table Pretraining for Tabular Transformers</title><source>arXiv.org</source><creator>Zhu, Bingzhao ; Shi, Xingjian ; Erickson, Nick ; Li, Mu ; Karypis, George ; Shoaran, Mahsa</creator><creatorcontrib>Zhu, Bingzhao ; Shi, Xingjian ; Erickson, Nick ; Li, Mu ; Karypis, George ; Shoaran, Mahsa</creatorcontrib><description>The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.</description><identifier>DOI: 10.48550/arxiv.2305.06090</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2023-05</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2305.06090$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.06090$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhu, Bingzhao</creatorcontrib><creatorcontrib>Shi, Xingjian</creatorcontrib><creatorcontrib>Erickson, Nick</creatorcontrib><creatorcontrib>Li, Mu</creatorcontrib><creatorcontrib>Karypis, George</creatorcontrib><creatorcontrib>Shoaran, Mahsa</creatorcontrib><title>XTab: Cross-table Pretraining for Tabular Transformers</title><description>The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqwzAURLXJorj5gK7qH7B7ZcmS1V0xSVsIJAsvsjPXehSDH-EqCenf1027OjDMDDOMPXHIZVWW8IJ06695IaDMQYGBB6aODXavaU1zjNkZu8GnB_Jnwn7qp680zJQuhsuACwmnuAijp_jIVgGH6Nf_TFiz3TT1R7bbv3_Wb7sMlYYMnamcksE6LUtlUWgp0XjluDFdUMFyjYXgXkFlrSyE0BYCdxa8ALOERcKe_2rvw9sT9SPSd_t7oL0fED_oJ0A3</recordid><startdate>20230510</startdate><enddate>20230510</enddate><creator>Zhu, Bingzhao</creator><creator>Shi, Xingjian</creator><creator>Erickson, Nick</creator><creator>Li, Mu</creator><creator>Karypis, George</creator><creator>Shoaran, Mahsa</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230510</creationdate><title>XTab: Cross-table Pretraining for Tabular Transformers</title><author>Zhu, Bingzhao ; Shi, Xingjian ; Erickson, Nick ; Li, Mu ; Karypis, George ; Shoaran, Mahsa</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-ad98d64fcd7456ca3744a9e6d199bf6fc17a231e608cc42337c0f1dc0e309ad93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Bingzhao</creatorcontrib><creatorcontrib>Shi, Xingjian</creatorcontrib><creatorcontrib>Erickson, Nick</creatorcontrib><creatorcontrib>Li, Mu</creatorcontrib><creatorcontrib>Karypis, George</creatorcontrib><creatorcontrib>Shoaran, Mahsa</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhu, Bingzhao</au><au>Shi, Xingjian</au><au>Erickson, Nick</au><au>Li, Mu</au><au>Karypis, George</au><au>Shoaran, Mahsa</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>XTab: Cross-table Pretraining for Tabular Transformers</atitle><date>2023-05-10</date><risdate>2023</risdate><abstract>The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.</abstract><doi>10.48550/arxiv.2305.06090</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2305.06090
ispartof
issn
language eng
recordid cdi_arxiv_primary_2305_06090
source arXiv.org
subjects Computer Science - Learning
title XTab: Cross-table Pretraining for Tabular Transformers
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T23%3A41%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=XTab:%20Cross-table%20Pretraining%20for%20Tabular%20Transformers&rft.au=Zhu,%20Bingzhao&rft.date=2023-05-10&rft_id=info:doi/10.48550/arxiv.2305.06090&rft_dat=%3Carxiv_GOX%3E2305_06090%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true