Deep Neural Networks and Tabular Data: A Survey
Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous datasets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their adap...
Gespeichert in:
Veröffentlicht in: | IEEE transaction on neural networks and learning systems 2024-06, Vol.35 (6), p.7499-7519 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 7519 |
---|---|
container_issue | 6 |
container_start_page | 7499 |
container_title | IEEE transaction on neural networks and learning systems |
container_volume | 35 |
creator | Borisov, Vadim Leemann, Tobias Sebler, Kathrin Haug, Johannes Pawelczyk, Martin Kasneci, Gjergji |
description | Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous datasets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their adaptation to tabular data for inference or data generation tasks remains highly challenging. To facilitate further progress in the field, this work provides an overview of state-of-the-art deep learning methods for tabular data. We categorize these methods into three groups: data transformations, specialized architectures, and regularization models. For each of these groups, our work offers a comprehensive overview of the main approaches. Moreover, we discuss deep learning approaches for generating tabular data and also provide an overview over strategies for explaining deep models on tabular data. Thus, our first contribution is to address the main research streams and existing methodologies in the mentioned areas while highlighting relevant challenges and open research questions. Our second contribution is to provide an empirical comparison of traditional machine learning methods with 11 deep learning approaches across five popular real-world tabular datasets of different sizes and with different learning objectives. Our results, which we have made publicly available as competitive benchmarks, indicate that algorithms based on gradient-boosted tree ensembles still mostly outperform deep learning models on supervised learning tasks, suggesting that the research progress on competitive deep learning models for tabular data is stagnating. To the best of our knowledge, this is the first in-depth overview of deep learning approaches for tabular data; as such, this work can serve as a valuable starting point to guide researchers and practitioners interested in deep learning with tabular data. |
doi_str_mv | 10.1109/TNNLS.2022.3229161 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TNNLS_2022_3229161</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9998482</ieee_id><sourcerecordid>3064714015</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-5778397fbd2141a1452abc053a010ae825cab9ef93049694609b70f8ab657853</originalsourceid><addsrcrecordid>eNpdkEtPAjEQgBujEYL8AU3MJl68LHT6rjcCvhKCB_bgreku3QRcWGy3Gv69iyAH5zKTzDeTmQ-ha8ADAKyH2Ww2nQ8IJmRACdEg4Ax1CQiSEqrU-amW7x3UD2GF2xCYC6YvUYdKDJwq6KLhxLltMnPR26pNzXftP0JiN4sks3msrE8mtrEPySiZR__ldlfoorRVcP1j7qHs6TEbv6TTt-fX8WiaFoyxJuVSKqplmS8IMLDAOLF5gTm1GLB1ivDC5tqVmmKmhWYC61ziUtlccKk47aH7w9qtrz-jC41ZL0PhqspuXB2DIVILEFgp1qJ3_9BVHf2mPc5QLJgEtv-1h8iBKnwdgnel2frl2vqdAWz2Qs2vULMXao5C26Hb4-qYr93iNPKnrwVuDsDSOXdqa60VU4T-APWUdZw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3064714015</pqid></control><display><type>article</type><title>Deep Neural Networks and Tabular Data: A Survey</title><source>IEEE Electronic Library (IEL)</source><creator>Borisov, Vadim ; Leemann, Tobias ; Sebler, Kathrin ; Haug, Johannes ; Pawelczyk, Martin ; Kasneci, Gjergji</creator><creatorcontrib>Borisov, Vadim ; Leemann, Tobias ; Sebler, Kathrin ; Haug, Johannes ; Pawelczyk, Martin ; Kasneci, Gjergji</creatorcontrib><description>Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous datasets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their adaptation to tabular data for inference or data generation tasks remains highly challenging. To facilitate further progress in the field, this work provides an overview of state-of-the-art deep learning methods for tabular data. We categorize these methods into three groups: data transformations, specialized architectures, and regularization models. For each of these groups, our work offers a comprehensive overview of the main approaches. Moreover, we discuss deep learning approaches for generating tabular data and also provide an overview over strategies for explaining deep models on tabular data. Thus, our first contribution is to address the main research streams and existing methodologies in the mentioned areas while highlighting relevant challenges and open research questions. Our second contribution is to provide an empirical comparison of traditional machine learning methods with 11 deep learning approaches across five popular real-world tabular datasets of different sizes and with different learning objectives. Our results, which we have made publicly available as competitive benchmarks, indicate that algorithms based on gradient-boosted tree ensembles still mostly outperform deep learning models on supervised learning tasks, suggesting that the research progress on competitive deep learning models for tabular data is stagnating. To the best of our knowledge, this is the first in-depth overview of deep learning approaches for tabular data; as such, this work can serve as a valuable starting point to guide researchers and practitioners interested in deep learning with tabular data.</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2022.3229161</identifier><identifier>PMID: 37015381</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithms ; Artificial neural networks ; Benchmark ; Benchmarks ; Cognitive tasks ; Data models ; Datasets ; Deep learning ; deep neural networks ; discrete data ; heterogeneous data ; interpretability ; Machine learning ; Neural networks ; Predictive models ; Probabilistic logic ; probabilistic modeling ; Regularization ; State-of-the-art reviews ; Supervised learning ; survey ; Tables (data) ; tabular data ; tabular data generation ; Task analysis ; Training</subject><ispartof>IEEE transaction on neural networks and learning systems, 2024-06, Vol.35 (6), p.7499-7519</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-5778397fbd2141a1452abc053a010ae825cab9ef93049694609b70f8ab657853</citedby><cites>FETCH-LOGICAL-c444t-5778397fbd2141a1452abc053a010ae825cab9ef93049694609b70f8ab657853</cites><orcidid>0000-0002-4889-9989 ; 0000-0002-3380-4641 ; 0000-0002-6191-4434 ; 0000-0002-3123-7268 ; 0000-0003-1286-3551 ; 0000-0001-9333-228X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9998482$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37015381$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Borisov, Vadim</creatorcontrib><creatorcontrib>Leemann, Tobias</creatorcontrib><creatorcontrib>Sebler, Kathrin</creatorcontrib><creatorcontrib>Haug, Johannes</creatorcontrib><creatorcontrib>Pawelczyk, Martin</creatorcontrib><creatorcontrib>Kasneci, Gjergji</creatorcontrib><title>Deep Neural Networks and Tabular Data: A Survey</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous datasets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their adaptation to tabular data for inference or data generation tasks remains highly challenging. To facilitate further progress in the field, this work provides an overview of state-of-the-art deep learning methods for tabular data. We categorize these methods into three groups: data transformations, specialized architectures, and regularization models. For each of these groups, our work offers a comprehensive overview of the main approaches. Moreover, we discuss deep learning approaches for generating tabular data and also provide an overview over strategies for explaining deep models on tabular data. Thus, our first contribution is to address the main research streams and existing methodologies in the mentioned areas while highlighting relevant challenges and open research questions. Our second contribution is to provide an empirical comparison of traditional machine learning methods with 11 deep learning approaches across five popular real-world tabular datasets of different sizes and with different learning objectives. Our results, which we have made publicly available as competitive benchmarks, indicate that algorithms based on gradient-boosted tree ensembles still mostly outperform deep learning models on supervised learning tasks, suggesting that the research progress on competitive deep learning models for tabular data is stagnating. To the best of our knowledge, this is the first in-depth overview of deep learning approaches for tabular data; as such, this work can serve as a valuable starting point to guide researchers and practitioners interested in deep learning with tabular data.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Benchmark</subject><subject>Benchmarks</subject><subject>Cognitive tasks</subject><subject>Data models</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>deep neural networks</subject><subject>discrete data</subject><subject>heterogeneous data</subject><subject>interpretability</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Predictive models</subject><subject>Probabilistic logic</subject><subject>probabilistic modeling</subject><subject>Regularization</subject><subject>State-of-the-art reviews</subject><subject>Supervised learning</subject><subject>survey</subject><subject>Tables (data)</subject><subject>tabular data</subject><subject>tabular data generation</subject><subject>Task analysis</subject><subject>Training</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><recordid>eNpdkEtPAjEQgBujEYL8AU3MJl68LHT6rjcCvhKCB_bgreku3QRcWGy3Gv69iyAH5zKTzDeTmQ-ha8ADAKyH2Ww2nQ8IJmRACdEg4Ax1CQiSEqrU-amW7x3UD2GF2xCYC6YvUYdKDJwq6KLhxLltMnPR26pNzXftP0JiN4sks3msrE8mtrEPySiZR__ldlfoorRVcP1j7qHs6TEbv6TTt-fX8WiaFoyxJuVSKqplmS8IMLDAOLF5gTm1GLB1ivDC5tqVmmKmhWYC61ziUtlccKk47aH7w9qtrz-jC41ZL0PhqspuXB2DIVILEFgp1qJ3_9BVHf2mPc5QLJgEtv-1h8iBKnwdgnel2frl2vqdAWz2Qs2vULMXao5C26Hb4-qYr93iNPKnrwVuDsDSOXdqa60VU4T-APWUdZw</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Borisov, Vadim</creator><creator>Leemann, Tobias</creator><creator>Sebler, Kathrin</creator><creator>Haug, Johannes</creator><creator>Pawelczyk, Martin</creator><creator>Kasneci, Gjergji</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QP</scope><scope>7QQ</scope><scope>7QR</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TK</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4889-9989</orcidid><orcidid>https://orcid.org/0000-0002-3380-4641</orcidid><orcidid>https://orcid.org/0000-0002-6191-4434</orcidid><orcidid>https://orcid.org/0000-0002-3123-7268</orcidid><orcidid>https://orcid.org/0000-0003-1286-3551</orcidid><orcidid>https://orcid.org/0000-0001-9333-228X</orcidid></search><sort><creationdate>20240601</creationdate><title>Deep Neural Networks and Tabular Data: A Survey</title><author>Borisov, Vadim ; Leemann, Tobias ; Sebler, Kathrin ; Haug, Johannes ; Pawelczyk, Martin ; Kasneci, Gjergji</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-5778397fbd2141a1452abc053a010ae825cab9ef93049694609b70f8ab657853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Benchmark</topic><topic>Benchmarks</topic><topic>Cognitive tasks</topic><topic>Data models</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>deep neural networks</topic><topic>discrete data</topic><topic>heterogeneous data</topic><topic>interpretability</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Predictive models</topic><topic>Probabilistic logic</topic><topic>probabilistic modeling</topic><topic>Regularization</topic><topic>State-of-the-art reviews</topic><topic>Supervised learning</topic><topic>survey</topic><topic>Tables (data)</topic><topic>tabular data</topic><topic>tabular data generation</topic><topic>Task analysis</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Borisov, Vadim</creatorcontrib><creatorcontrib>Leemann, Tobias</creatorcontrib><creatorcontrib>Sebler, Kathrin</creatorcontrib><creatorcontrib>Haug, Johannes</creatorcontrib><creatorcontrib>Pawelczyk, Martin</creatorcontrib><creatorcontrib>Kasneci, Gjergji</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Borisov, Vadim</au><au>Leemann, Tobias</au><au>Sebler, Kathrin</au><au>Haug, Johannes</au><au>Pawelczyk, Martin</au><au>Kasneci, Gjergji</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Neural Networks and Tabular Data: A Survey</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2024-06-01</date><risdate>2024</risdate><volume>35</volume><issue>6</issue><spage>7499</spage><epage>7519</epage><pages>7499-7519</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous datasets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their adaptation to tabular data for inference or data generation tasks remains highly challenging. To facilitate further progress in the field, this work provides an overview of state-of-the-art deep learning methods for tabular data. We categorize these methods into three groups: data transformations, specialized architectures, and regularization models. For each of these groups, our work offers a comprehensive overview of the main approaches. Moreover, we discuss deep learning approaches for generating tabular data and also provide an overview over strategies for explaining deep models on tabular data. Thus, our first contribution is to address the main research streams and existing methodologies in the mentioned areas while highlighting relevant challenges and open research questions. Our second contribution is to provide an empirical comparison of traditional machine learning methods with 11 deep learning approaches across five popular real-world tabular datasets of different sizes and with different learning objectives. Our results, which we have made publicly available as competitive benchmarks, indicate that algorithms based on gradient-boosted tree ensembles still mostly outperform deep learning models on supervised learning tasks, suggesting that the research progress on competitive deep learning models for tabular data is stagnating. To the best of our knowledge, this is the first in-depth overview of deep learning approaches for tabular data; as such, this work can serve as a valuable starting point to guide researchers and practitioners interested in deep learning with tabular data.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>37015381</pmid><doi>10.1109/TNNLS.2022.3229161</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0002-4889-9989</orcidid><orcidid>https://orcid.org/0000-0002-3380-4641</orcidid><orcidid>https://orcid.org/0000-0002-6191-4434</orcidid><orcidid>https://orcid.org/0000-0002-3123-7268</orcidid><orcidid>https://orcid.org/0000-0003-1286-3551</orcidid><orcidid>https://orcid.org/0000-0001-9333-228X</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2162-237X |
ispartof | IEEE transaction on neural networks and learning systems, 2024-06, Vol.35 (6), p.7499-7519 |
issn | 2162-237X 2162-2388 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TNNLS_2022_3229161 |
source | IEEE Electronic Library (IEL) |
subjects | Algorithms Artificial neural networks Benchmark Benchmarks Cognitive tasks Data models Datasets Deep learning deep neural networks discrete data heterogeneous data interpretability Machine learning Neural networks Predictive models Probabilistic logic probabilistic modeling Regularization State-of-the-art reviews Supervised learning survey Tables (data) tabular data tabular data generation Task analysis Training |
title | Deep Neural Networks and Tabular Data: A Survey |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T23%3A17%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Neural%20Networks%20and%20Tabular%20Data:%20A%20Survey&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Borisov,%20Vadim&rft.date=2024-06-01&rft.volume=35&rft.issue=6&rft.spage=7499&rft.epage=7519&rft.pages=7499-7519&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2022.3229161&rft_dat=%3Cproquest_cross%3E3064714015%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3064714015&rft_id=info:pmid/37015381&rft_ieee_id=9998482&rfr_iscdi=true |