XTab: Cross-table Pretraining for Tabular Transformers

The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. I...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhu, Bingzhao, Shi, Xingjian, Erickson, Nick, Li, Mu, Karypis, George, Shoaran, Mahsa
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhu, Bingzhao Shi, Xingjian Erickson, Nick Li, Mu Karypis, George Shoaran, Mahsa
description	The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.
doi_str_mv	10.48550/arxiv.2305.06090
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2305_06090</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2305_06090</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-ad98d64fcd7456ca3744a9e6d199bf6fc17a231e608cc42337c0f1dc0e309ad93</originalsourceid><addsrcrecordid>eNotj8tqwzAURLXJorj5gK7qH7B7ZcmS1V0xSVsIJAsvsjPXehSDH-EqCenf1027OjDMDDOMPXHIZVWW8IJ06695IaDMQYGBB6aODXavaU1zjNkZu8GnB_Jnwn7qp680zJQuhsuACwmnuAijp_jIVgGH6Nf_TFiz3TT1R7bbv3_Wb7sMlYYMnamcksE6LUtlUWgp0XjluDFdUMFyjYXgXkFlrSyE0BYCdxa8ALOERcKe_2rvw9sT9SPSd_t7oL0fED_oJ0A3</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>XTab: Cross-table Pretraining for Tabular Transformers</title><source>arXiv.org</source><creator>Zhu, Bingzhao ; Shi, Xingjian ; Erickson, Nick ; Li, Mu ; Karypis, George ; Shoaran, Mahsa</creator><creatorcontrib>Zhu, Bingzhao ; Shi, Xingjian ; Erickson, Nick ; Li, Mu ; Karypis, George ; Shoaran, Mahsa</creatorcontrib><description>The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.</description><identifier>DOI: 10.48550/arxiv.2305.06090</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2023-05</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2305.06090$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.06090$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhu, Bingzhao</creatorcontrib><creatorcontrib>Shi, Xingjian</creatorcontrib><creatorcontrib>Erickson, Nick</creatorcontrib><creatorcontrib>Li, Mu</creatorcontrib><creatorcontrib>Karypis, George</creatorcontrib><creatorcontrib>Shoaran, Mahsa</creatorcontrib><title>XTab: Cross-table Pretraining for Tabular Transformers</title><description>The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqwzAURLXJorj5gK7qH7B7ZcmS1V0xSVsIJAsvsjPXehSDH-EqCenf1027OjDMDDOMPXHIZVWW8IJ06695IaDMQYGBB6aODXavaU1zjNkZu8GnB_Jnwn7qp680zJQuhsuACwmnuAijp_jIVgGH6Nf_TFiz3TT1R7bbv3_Wb7sMlYYMnamcksE6LUtlUWgp0XjluDFdUMFyjYXgXkFlrSyE0BYCdxa8ALOERcKe_2rvw9sT9SPSd_t7oL0fED_oJ0A3</recordid><startdate>20230510</startdate><enddate>20230510</enddate><creator>Zhu, Bingzhao</creator><creator>Shi, Xingjian</creator><creator>Erickson, Nick</creator><creator>Li, Mu</creator><creator>Karypis, George</creator><creator>Shoaran, Mahsa</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230510</creationdate><title>XTab: Cross-table Pretraining for Tabular Transformers</title><author>Zhu, Bingzhao ; Shi, Xingjian ; Erickson, Nick ; Li, Mu ; Karypis, George ; Shoaran, Mahsa</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-ad98d64fcd7456ca3744a9e6d199bf6fc17a231e608cc42337c0f1dc0e309ad93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Bingzhao</creatorcontrib><creatorcontrib>Shi, Xingjian</creatorcontrib><creatorcontrib>Erickson, Nick</creatorcontrib><creatorcontrib>Li, Mu</creatorcontrib><creatorcontrib>Karypis, George</creatorcontrib><creatorcontrib>Shoaran, Mahsa</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhu, Bingzhao</au><au>Shi, Xingjian</au><au>Erickson, Nick</au><au>Li, Mu</au><au>Karypis, George</au><au>Shoaran, Mahsa</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>XTab: Cross-table Pretraining for Tabular Transformers</atitle><date>2023-05-10</date><risdate>2023</risdate><abstract>The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.</abstract><doi>10.48550/arxiv.2305.06090</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2305.06090
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2305_06090
source	arXiv.org
subjects	Computer Science - Learning
title	XTab: Cross-table Pretraining for Tabular Transformers
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T23%3A41%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=XTab:%20Cross-table%20Pretraining%20for%20Tabular%20Transformers&rft.au=Zhu,%20Bingzhao&rft.date=2023-05-10&rft_id=info:doi/10.48550/arxiv.2305.06090&rft_dat=%3Carxiv_GOX%3E2305_06090%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true