Deep Neural Networks and Tabular Data: A Survey

Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous datasets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their adap...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2024-06, Vol.35 (6), p.7499-7519
Hauptverfasser: Borisov, Vadim, Leemann, Tobias, Sebler, Kathrin, Haug, Johannes, Pawelczyk, Martin, Kasneci, Gjergji
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 7519
container_issue 6
container_start_page 7499
container_title IEEE transaction on neural networks and learning systems
container_volume 35
creator Borisov, Vadim
Leemann, Tobias
Sebler, Kathrin
Haug, Johannes
Pawelczyk, Martin
Kasneci, Gjergji
description Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous datasets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their adaptation to tabular data for inference or data generation tasks remains highly challenging. To facilitate further progress in the field, this work provides an overview of state-of-the-art deep learning methods for tabular data. We categorize these methods into three groups: data transformations, specialized architectures, and regularization models. For each of these groups, our work offers a comprehensive overview of the main approaches. Moreover, we discuss deep learning approaches for generating tabular data and also provide an overview over strategies for explaining deep models on tabular data. Thus, our first contribution is to address the main research streams and existing methodologies in the mentioned areas while highlighting relevant challenges and open research questions. Our second contribution is to provide an empirical comparison of traditional machine learning methods with 11 deep learning approaches across five popular real-world tabular datasets of different sizes and with different learning objectives. Our results, which we have made publicly available as competitive benchmarks, indicate that algorithms based on gradient-boosted tree ensembles still mostly outperform deep learning models on supervised learning tasks, suggesting that the research progress on competitive deep learning models for tabular data is stagnating. To the best of our knowledge, this is the first in-depth overview of deep learning approaches for tabular data; as such, this work can serve as a valuable starting point to guide researchers and practitioners interested in deep learning with tabular data.
doi_str_mv 10.1109/TNNLS.2022.3229161
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TNNLS_2022_3229161</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9998482</ieee_id><sourcerecordid>3064714015</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-5778397fbd2141a1452abc053a010ae825cab9ef93049694609b70f8ab657853</originalsourceid><addsrcrecordid>eNpdkEtPAjEQgBujEYL8AU3MJl68LHT6rjcCvhKCB_bgreku3QRcWGy3Gv69iyAH5zKTzDeTmQ-ha8ADAKyH2Ww2nQ8IJmRACdEg4Ax1CQiSEqrU-amW7x3UD2GF2xCYC6YvUYdKDJwq6KLhxLltMnPR26pNzXftP0JiN4sks3msrE8mtrEPySiZR__ldlfoorRVcP1j7qHs6TEbv6TTt-fX8WiaFoyxJuVSKqplmS8IMLDAOLF5gTm1GLB1ivDC5tqVmmKmhWYC61ziUtlccKk47aH7w9qtrz-jC41ZL0PhqspuXB2DIVILEFgp1qJ3_9BVHf2mPc5QLJgEtv-1h8iBKnwdgnel2frl2vqdAWz2Qs2vULMXao5C26Hb4-qYr93iNPKnrwVuDsDSOXdqa60VU4T-APWUdZw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3064714015</pqid></control><display><type>article</type><title>Deep Neural Networks and Tabular Data: A Survey</title><source>IEEE Electronic Library (IEL)</source><creator>Borisov, Vadim ; Leemann, Tobias ; Sebler, Kathrin ; Haug, Johannes ; Pawelczyk, Martin ; Kasneci, Gjergji</creator><creatorcontrib>Borisov, Vadim ; Leemann, Tobias ; Sebler, Kathrin ; Haug, Johannes ; Pawelczyk, Martin ; Kasneci, Gjergji</creatorcontrib><description>Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous datasets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their adaptation to tabular data for inference or data generation tasks remains highly challenging. To facilitate further progress in the field, this work provides an overview of state-of-the-art deep learning methods for tabular data. We categorize these methods into three groups: data transformations, specialized architectures, and regularization models. For each of these groups, our work offers a comprehensive overview of the main approaches. Moreover, we discuss deep learning approaches for generating tabular data and also provide an overview over strategies for explaining deep models on tabular data. Thus, our first contribution is to address the main research streams and existing methodologies in the mentioned areas while highlighting relevant challenges and open research questions. Our second contribution is to provide an empirical comparison of traditional machine learning methods with 11 deep learning approaches across five popular real-world tabular datasets of different sizes and with different learning objectives. Our results, which we have made publicly available as competitive benchmarks, indicate that algorithms based on gradient-boosted tree ensembles still mostly outperform deep learning models on supervised learning tasks, suggesting that the research progress on competitive deep learning models for tabular data is stagnating. To the best of our knowledge, this is the first in-depth overview of deep learning approaches for tabular data; as such, this work can serve as a valuable starting point to guide researchers and practitioners interested in deep learning with tabular data.</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2022.3229161</identifier><identifier>PMID: 37015381</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithms ; Artificial neural networks ; Benchmark ; Benchmarks ; Cognitive tasks ; Data models ; Datasets ; Deep learning ; deep neural networks ; discrete data ; heterogeneous data ; interpretability ; Machine learning ; Neural networks ; Predictive models ; Probabilistic logic ; probabilistic modeling ; Regularization ; State-of-the-art reviews ; Supervised learning ; survey ; Tables (data) ; tabular data ; tabular data generation ; Task analysis ; Training</subject><ispartof>IEEE transaction on neural networks and learning systems, 2024-06, Vol.35 (6), p.7499-7519</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-5778397fbd2141a1452abc053a010ae825cab9ef93049694609b70f8ab657853</citedby><cites>FETCH-LOGICAL-c444t-5778397fbd2141a1452abc053a010ae825cab9ef93049694609b70f8ab657853</cites><orcidid>0000-0002-4889-9989 ; 0000-0002-3380-4641 ; 0000-0002-6191-4434 ; 0000-0002-3123-7268 ; 0000-0003-1286-3551 ; 0000-0001-9333-228X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9998482$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37015381$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Borisov, Vadim</creatorcontrib><creatorcontrib>Leemann, Tobias</creatorcontrib><creatorcontrib>Sebler, Kathrin</creatorcontrib><creatorcontrib>Haug, Johannes</creatorcontrib><creatorcontrib>Pawelczyk, Martin</creatorcontrib><creatorcontrib>Kasneci, Gjergji</creatorcontrib><title>Deep Neural Networks and Tabular Data: A Survey</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous datasets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their adaptation to tabular data for inference or data generation tasks remains highly challenging. To facilitate further progress in the field, this work provides an overview of state-of-the-art deep learning methods for tabular data. We categorize these methods into three groups: data transformations, specialized architectures, and regularization models. For each of these groups, our work offers a comprehensive overview of the main approaches. Moreover, we discuss deep learning approaches for generating tabular data and also provide an overview over strategies for explaining deep models on tabular data. Thus, our first contribution is to address the main research streams and existing methodologies in the mentioned areas while highlighting relevant challenges and open research questions. Our second contribution is to provide an empirical comparison of traditional machine learning methods with 11 deep learning approaches across five popular real-world tabular datasets of different sizes and with different learning objectives. Our results, which we have made publicly available as competitive benchmarks, indicate that algorithms based on gradient-boosted tree ensembles still mostly outperform deep learning models on supervised learning tasks, suggesting that the research progress on competitive deep learning models for tabular data is stagnating. To the best of our knowledge, this is the first in-depth overview of deep learning approaches for tabular data; as such, this work can serve as a valuable starting point to guide researchers and practitioners interested in deep learning with tabular data.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Benchmark</subject><subject>Benchmarks</subject><subject>Cognitive tasks</subject><subject>Data models</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>deep neural networks</subject><subject>discrete data</subject><subject>heterogeneous data</subject><subject>interpretability</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Predictive models</subject><subject>Probabilistic logic</subject><subject>probabilistic modeling</subject><subject>Regularization</subject><subject>State-of-the-art reviews</subject><subject>Supervised learning</subject><subject>survey</subject><subject>Tables (data)</subject><subject>tabular data</subject><subject>tabular data generation</subject><subject>Task analysis</subject><subject>Training</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><recordid>eNpdkEtPAjEQgBujEYL8AU3MJl68LHT6rjcCvhKCB_bgreku3QRcWGy3Gv69iyAH5zKTzDeTmQ-ha8ADAKyH2Ww2nQ8IJmRACdEg4Ax1CQiSEqrU-amW7x3UD2GF2xCYC6YvUYdKDJwq6KLhxLltMnPR26pNzXftP0JiN4sks3msrE8mtrEPySiZR__ldlfoorRVcP1j7qHs6TEbv6TTt-fX8WiaFoyxJuVSKqplmS8IMLDAOLF5gTm1GLB1ivDC5tqVmmKmhWYC61ziUtlccKk47aH7w9qtrz-jC41ZL0PhqspuXB2DIVILEFgp1qJ3_9BVHf2mPc5QLJgEtv-1h8iBKnwdgnel2frl2vqdAWz2Qs2vULMXao5C26Hb4-qYr93iNPKnrwVuDsDSOXdqa60VU4T-APWUdZw</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Borisov, Vadim</creator><creator>Leemann, Tobias</creator><creator>Sebler, Kathrin</creator><creator>Haug, Johannes</creator><creator>Pawelczyk, Martin</creator><creator>Kasneci, Gjergji</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QP</scope><scope>7QQ</scope><scope>7QR</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TK</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4889-9989</orcidid><orcidid>https://orcid.org/0000-0002-3380-4641</orcidid><orcidid>https://orcid.org/0000-0002-6191-4434</orcidid><orcidid>https://orcid.org/0000-0002-3123-7268</orcidid><orcidid>https://orcid.org/0000-0003-1286-3551</orcidid><orcidid>https://orcid.org/0000-0001-9333-228X</orcidid></search><sort><creationdate>20240601</creationdate><title>Deep Neural Networks and Tabular Data: A Survey</title><author>Borisov, Vadim ; Leemann, Tobias ; Sebler, Kathrin ; Haug, Johannes ; Pawelczyk, Martin ; Kasneci, Gjergji</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-5778397fbd2141a1452abc053a010ae825cab9ef93049694609b70f8ab657853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Benchmark</topic><topic>Benchmarks</topic><topic>Cognitive tasks</topic><topic>Data models</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>deep neural networks</topic><topic>discrete data</topic><topic>heterogeneous data</topic><topic>interpretability</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Predictive models</topic><topic>Probabilistic logic</topic><topic>probabilistic modeling</topic><topic>Regularization</topic><topic>State-of-the-art reviews</topic><topic>Supervised learning</topic><topic>survey</topic><topic>Tables (data)</topic><topic>tabular data</topic><topic>tabular data generation</topic><topic>Task analysis</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Borisov, Vadim</creatorcontrib><creatorcontrib>Leemann, Tobias</creatorcontrib><creatorcontrib>Sebler, Kathrin</creatorcontrib><creatorcontrib>Haug, Johannes</creatorcontrib><creatorcontrib>Pawelczyk, Martin</creatorcontrib><creatorcontrib>Kasneci, Gjergji</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Borisov, Vadim</au><au>Leemann, Tobias</au><au>Sebler, Kathrin</au><au>Haug, Johannes</au><au>Pawelczyk, Martin</au><au>Kasneci, Gjergji</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Neural Networks and Tabular Data: A Survey</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2024-06-01</date><risdate>2024</risdate><volume>35</volume><issue>6</issue><spage>7499</spage><epage>7519</epage><pages>7499-7519</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous datasets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their adaptation to tabular data for inference or data generation tasks remains highly challenging. To facilitate further progress in the field, this work provides an overview of state-of-the-art deep learning methods for tabular data. We categorize these methods into three groups: data transformations, specialized architectures, and regularization models. For each of these groups, our work offers a comprehensive overview of the main approaches. Moreover, we discuss deep learning approaches for generating tabular data and also provide an overview over strategies for explaining deep models on tabular data. Thus, our first contribution is to address the main research streams and existing methodologies in the mentioned areas while highlighting relevant challenges and open research questions. Our second contribution is to provide an empirical comparison of traditional machine learning methods with 11 deep learning approaches across five popular real-world tabular datasets of different sizes and with different learning objectives. Our results, which we have made publicly available as competitive benchmarks, indicate that algorithms based on gradient-boosted tree ensembles still mostly outperform deep learning models on supervised learning tasks, suggesting that the research progress on competitive deep learning models for tabular data is stagnating. To the best of our knowledge, this is the first in-depth overview of deep learning approaches for tabular data; as such, this work can serve as a valuable starting point to guide researchers and practitioners interested in deep learning with tabular data.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>37015381</pmid><doi>10.1109/TNNLS.2022.3229161</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0002-4889-9989</orcidid><orcidid>https://orcid.org/0000-0002-3380-4641</orcidid><orcidid>https://orcid.org/0000-0002-6191-4434</orcidid><orcidid>https://orcid.org/0000-0002-3123-7268</orcidid><orcidid>https://orcid.org/0000-0003-1286-3551</orcidid><orcidid>https://orcid.org/0000-0001-9333-228X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2162-237X
ispartof IEEE transaction on neural networks and learning systems, 2024-06, Vol.35 (6), p.7499-7519
issn 2162-237X
2162-2388
language eng
recordid cdi_crossref_primary_10_1109_TNNLS_2022_3229161
source IEEE Electronic Library (IEL)
subjects Algorithms
Artificial neural networks
Benchmark
Benchmarks
Cognitive tasks
Data models
Datasets
Deep learning
deep neural networks
discrete data
heterogeneous data
interpretability
Machine learning
Neural networks
Predictive models
Probabilistic logic
probabilistic modeling
Regularization
State-of-the-art reviews
Supervised learning
survey
Tables (data)
tabular data
tabular data generation
Task analysis
Training
title Deep Neural Networks and Tabular Data: A Survey
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T23%3A17%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Neural%20Networks%20and%20Tabular%20Data:%20A%20Survey&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Borisov,%20Vadim&rft.date=2024-06-01&rft.volume=35&rft.issue=6&rft.spage=7499&rft.epage=7519&rft.pages=7499-7519&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2022.3229161&rft_dat=%3Cproquest_cross%3E3064714015%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3064714015&rft_id=info:pmid/37015381&rft_ieee_id=9998482&rfr_iscdi=true