TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023

Deep learning (DL) models for tabular data problems (e.g. classification, regression) are currently receiving increasingly more attention from researchers. However, despite the recent efforts, the non-DL algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution for th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-10
Hauptverfasser:	Gorishniy, Yury, Rubachev, Ivan, Kartashev, Nikolay, Shlenskii, Daniil, Kotelnikov, Akim, Babenko, Artem
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Benchmarks Computer vision Decision trees Deep learning Natural language processing Retrieval Tables (data)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Gorishniy, Yury Rubachev, Ivan Kartashev, Nikolay Shlenskii, Daniil Kotelnikov, Akim Babenko, Artem
description	Deep learning (DL) models for tabular data problems (e.g. classification, regression) are currently receiving increasingly more attention from researchers. However, despite the recent efforts, the non-DL algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution for these problems. One of the research directions aimed at improving the position of tabular DL involves designing so-called retrieval-augmented models. For a target object, such models retrieve other objects (e.g. the nearest neighbors) from the available training data and use their features and labels to make a better prediction. In this work, we present TabR -- essentially, a feed-forward network with a custom k-Nearest-Neighbors-like component in the middle. On a set of public benchmarks with datasets up to several million objects, TabR marks a big step forward for tabular DL: it demonstrates the best average performance among tabular DL models, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed "GBDT-friendly" benchmark (see Figure 1). Among the important findings and technical details powering TabR, the main ones lie in the attention-like mechanism that is responsible for retrieving the nearest neighbors and extracting valuable signal from them. In addition to the much higher performance, TabR is simple and significantly more efficient compared to prior retrieval-based tabular DL models.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2842691895</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2842691895</sourcerecordid><originalsourceid>FETCH-proquest_journals_28426918953</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwDklMCrJSAJKlOYlFCi6pqQUKPqmJRXmZeekKvqmpJcUKfkBuanEJkM5Mz0jKLypWyMxTMDIwMuZhYE1LzClO5YXS3AzKbq4hzh66BUX5haVALfFZ-aVFeUCpeCOg7WaWhhaWpsbEqQIAHeo1Vw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2842691895</pqid></control><display><type>article</type><title>TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023</title><source>Free E- Journals</source><creator>Gorishniy, Yury ; Rubachev, Ivan ; Kartashev, Nikolay ; Shlenskii, Daniil ; Kotelnikov, Akim ; Babenko, Artem</creator><creatorcontrib>Gorishniy, Yury ; Rubachev, Ivan ; Kartashev, Nikolay ; Shlenskii, Daniil ; Kotelnikov, Akim ; Babenko, Artem</creatorcontrib><description>Deep learning (DL) models for tabular data problems (e.g. classification, regression) are currently receiving increasingly more attention from researchers. However, despite the recent efforts, the non-DL algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution for these problems. One of the research directions aimed at improving the position of tabular DL involves designing so-called retrieval-augmented models. For a target object, such models retrieve other objects (e.g. the nearest neighbors) from the available training data and use their features and labels to make a better prediction. In this work, we present TabR -- essentially, a feed-forward network with a custom k-Nearest-Neighbors-like component in the middle. On a set of public benchmarks with datasets up to several million objects, TabR marks a big step forward for tabular DL: it demonstrates the best average performance among tabular DL models, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed "GBDT-friendly" benchmark (see Figure 1). Among the important findings and technical details powering TabR, the main ones lie in the attention-like mechanism that is responsible for retrieving the nearest neighbors and extracting valuable signal from them. In addition to the much higher performance, TabR is simple and significantly more efficient compared to prior retrieval-based tabular DL models.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Benchmarks ; Computer vision ; Decision trees ; Deep learning ; Natural language processing ; Retrieval ; Tables (data)</subject><ispartof>arXiv.org, 2023-10</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Gorishniy, Yury</creatorcontrib><creatorcontrib>Rubachev, Ivan</creatorcontrib><creatorcontrib>Kartashev, Nikolay</creatorcontrib><creatorcontrib>Shlenskii, Daniil</creatorcontrib><creatorcontrib>Kotelnikov, Akim</creatorcontrib><creatorcontrib>Babenko, Artem</creatorcontrib><title>TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023</title><title>arXiv.org</title><description>Deep learning (DL) models for tabular data problems (e.g. classification, regression) are currently receiving increasingly more attention from researchers. However, despite the recent efforts, the non-DL algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution for these problems. One of the research directions aimed at improving the position of tabular DL involves designing so-called retrieval-augmented models. For a target object, such models retrieve other objects (e.g. the nearest neighbors) from the available training data and use their features and labels to make a better prediction. In this work, we present TabR -- essentially, a feed-forward network with a custom k-Nearest-Neighbors-like component in the middle. On a set of public benchmarks with datasets up to several million objects, TabR marks a big step forward for tabular DL: it demonstrates the best average performance among tabular DL models, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed "GBDT-friendly" benchmark (see Figure 1). Among the important findings and technical details powering TabR, the main ones lie in the attention-like mechanism that is responsible for retrieving the nearest neighbors and extracting valuable signal from them. In addition to the much higher performance, TabR is simple and significantly more efficient compared to prior retrieval-based tabular DL models.</description><subject>Algorithms</subject><subject>Benchmarks</subject><subject>Computer vision</subject><subject>Decision trees</subject><subject>Deep learning</subject><subject>Natural language processing</subject><subject>Retrieval</subject><subject>Tables (data)</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwDklMCrJSAJKlOYlFCi6pqQUKPqmJRXmZeekKvqmpJcUKfkBuanEJkM5Mz0jKLypWyMxTMDIwMuZhYE1LzClO5YXS3AzKbq4hzh66BUX5haVALfFZ-aVFeUCpeCOg7WaWhhaWpsbEqQIAHeo1Vw</recordid><startdate>20231026</startdate><enddate>20231026</enddate><creator>Gorishniy, Yury</creator><creator>Rubachev, Ivan</creator><creator>Kartashev, Nikolay</creator><creator>Shlenskii, Daniil</creator><creator>Kotelnikov, Akim</creator><creator>Babenko, Artem</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231026</creationdate><title>TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023</title><author>Gorishniy, Yury ; Rubachev, Ivan ; Kartashev, Nikolay ; Shlenskii, Daniil ; Kotelnikov, Akim ; Babenko, Artem</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28426918953</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Benchmarks</topic><topic>Computer vision</topic><topic>Decision trees</topic><topic>Deep learning</topic><topic>Natural language processing</topic><topic>Retrieval</topic><topic>Tables (data)</topic><toplevel>online_resources</toplevel><creatorcontrib>Gorishniy, Yury</creatorcontrib><creatorcontrib>Rubachev, Ivan</creatorcontrib><creatorcontrib>Kartashev, Nikolay</creatorcontrib><creatorcontrib>Shlenskii, Daniil</creatorcontrib><creatorcontrib>Kotelnikov, Akim</creatorcontrib><creatorcontrib>Babenko, Artem</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gorishniy, Yury</au><au>Rubachev, Ivan</au><au>Kartashev, Nikolay</au><au>Shlenskii, Daniil</au><au>Kotelnikov, Akim</au><au>Babenko, Artem</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023</atitle><jtitle>arXiv.org</jtitle><date>2023-10-26</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Deep learning (DL) models for tabular data problems (e.g. classification, regression) are currently receiving increasingly more attention from researchers. However, despite the recent efforts, the non-DL algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution for these problems. One of the research directions aimed at improving the position of tabular DL involves designing so-called retrieval-augmented models. For a target object, such models retrieve other objects (e.g. the nearest neighbors) from the available training data and use their features and labels to make a better prediction. In this work, we present TabR -- essentially, a feed-forward network with a custom k-Nearest-Neighbors-like component in the middle. On a set of public benchmarks with datasets up to several million objects, TabR marks a big step forward for tabular DL: it demonstrates the best average performance among tabular DL models, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed "GBDT-friendly" benchmark (see Figure 1). Among the important findings and technical details powering TabR, the main ones lie in the attention-like mechanism that is responsible for retrieving the nearest neighbors and extracting valuable signal from them. In addition to the much higher performance, TabR is simple and significantly more efficient compared to prior retrieval-based tabular DL models.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-10
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2842691895
source	Free E- Journals
subjects	Algorithms Benchmarks Computer vision Decision trees Deep learning Natural language processing Retrieval Tables (data)
title	TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T16%3A00%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=TabR:%20Tabular%20Deep%20Learning%20Meets%20Nearest%20Neighbors%20in%202023&rft.jtitle=arXiv.org&rft.au=Gorishniy,%20Yury&rft.date=2023-10-26&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2842691895%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2842691895&rft_id=info:pmid/&rfr_iscdi=true