Undersampling bankruptcy prediction: Taiwan bankruptcy data

Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbal...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PloS one 2021-07, Vol.16 (7), p.e0254030-e0254030
Hauptverfasser: Wang, Haoming, Liu, Xiangdong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e0254030
container_issue 7
container_start_page e0254030
container_title PloS one
container_volume 16
creator Wang, Haoming
Liu, Xiangdong
description Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbalance problem, but little attention has been paid to the effect of the undersampling technology. Therefore, a framework is used to spot-check algorithms quickly and combine which undersampling method and classification model performs best. The results show that Naive Bayes (NB) after Edited Nearest Neighbors (ENN) has the best performance, with an F2-measure of 0.423. In addition, by changing the undersampling rate of the cluster centroid-based method, we find that the performance of the Linear Discriminant Analysis (LDA) and Naive Bayes (NB) are affected by the undersampling rate. Neither of them is uniformly declining, and LDA has higher performance when the undersampling rate is 30%. This study accordingly provides another perspective and a guide for future design.
doi_str_mv 10.1371/journal.pone.0254030
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2547543265</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A667082166</galeid><doaj_id>oai_doaj_org_article_e5768835bfc844bbb9e898bdc224e09b</doaj_id><sourcerecordid>A667082166</sourcerecordid><originalsourceid>FETCH-LOGICAL-c669t-5f9a71029a3f257699ab5ae7dffdc0f3a1714c09f2e8edc1c6e8fac8595e79473</originalsourceid><addsrcrecordid>eNqNkltrFDEUxwdR7EW_geCCIPVh19wmkygIpXhZKBS09TWcySS7WWeTMZlR--3NdkfZkT5IHhLO-eV_LvyL4hlGC0wr_HoThuihXXTBmwUiJUMUPSiOsaRkzgmiDw_eR8VJShuESio4f1wcUYZlVVJ6XLy98Y2JCbZd6_xqVoP_Foeu17ezLprG6d4F_2Z2De4n-MNsAz08KR5ZaJN5Ot6nxc2H99cXn-aXVx-XF-eXc8257OellVBhRCRQS8qKSwl1CaZqrG00shRwhZlG0hIjTKOx5kZY0KKUpakkq-hp8Xyv27UhqXHupPLIVcko4WUmlnuiCbBRXXRbiLcqgFN3gRBXCmLvdGuUyR0IQcvaasFYXdfSCCnqRhPCDJJ11no3Vhvqbe7H-D5COxGdZrxbq1X4oQRhggueBc5GgRi-Dyb1auuSNm0L3oThrm_BMEKUZvTFP-j9043UCvIAztuQ6-qdqDrnvEKCYL4ru7iHyqcxW6ezSazL8cmHV5MPmenNr34FQ0pq-eXz_7NXX6fsywN2baDt1ym0w85KaQqyPahjSCka-3fJGKmdx_9sQ-08rkaP0989qOzk</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2547543265</pqid></control><display><type>article</type><title>Undersampling bankruptcy prediction: Taiwan bankruptcy data</title><source>Public Library of Science</source><source>Full-Text Journals in Chemistry (Open access)</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central</source><source>EZB Electronic Journals Library</source><creator>Wang, Haoming ; Liu, Xiangdong</creator><contributor>Gadekallu, Thippa Reddy</contributor><creatorcontrib>Wang, Haoming ; Liu, Xiangdong ; Gadekallu, Thippa Reddy</creatorcontrib><description>Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbalance problem, but little attention has been paid to the effect of the undersampling technology. Therefore, a framework is used to spot-check algorithms quickly and combine which undersampling method and classification model performs best. The results show that Naive Bayes (NB) after Edited Nearest Neighbors (ENN) has the best performance, with an F2-measure of 0.423. In addition, by changing the undersampling rate of the cluster centroid-based method, we find that the performance of the Linear Discriminant Analysis (LDA) and Naive Bayes (NB) are affected by the undersampling rate. Neither of them is uniformly declining, and LDA has higher performance when the undersampling rate is 30%. This study accordingly provides another perspective and a guide for future design.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0254030</identifier><identifier>PMID: 34197533</identifier><language>eng</language><publisher>San Francisco: Public Library of Science</publisher><subject>Algorithms ; Analysis ; Bankruptcy ; Bayesian analysis ; Biology and Life Sciences ; Business failures ; Centroids ; Classification ; Computer and Information Sciences ; Datasets ; Discriminant analysis ; Economic impact ; Engineering and Technology ; Evaluation ; Historical account ; Information management ; Learning algorithms ; Literature reviews ; Machine learning ; Methods ; Neural networks ; Physical Sciences ; Predictions ; Research and Analysis Methods ; Social Sciences ; Support vector machines</subject><ispartof>PloS one, 2021-07, Vol.16 (7), p.e0254030-e0254030</ispartof><rights>COPYRIGHT 2021 Public Library of Science</rights><rights>2021 Wang, Liu. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Wang, Liu 2021 Wang, Liu</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c669t-5f9a71029a3f257699ab5ae7dffdc0f3a1714c09f2e8edc1c6e8fac8595e79473</citedby><cites>FETCH-LOGICAL-c669t-5f9a71029a3f257699ab5ae7dffdc0f3a1714c09f2e8edc1c6e8fac8595e79473</cites><orcidid>0000-0003-3624-2127</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248686/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248686/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,861,882,2096,2915,23847,27905,27906,53772,53774,79349,79350</link.rule.ids></links><search><contributor>Gadekallu, Thippa Reddy</contributor><creatorcontrib>Wang, Haoming</creatorcontrib><creatorcontrib>Liu, Xiangdong</creatorcontrib><title>Undersampling bankruptcy prediction: Taiwan bankruptcy data</title><title>PloS one</title><description>Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbalance problem, but little attention has been paid to the effect of the undersampling technology. Therefore, a framework is used to spot-check algorithms quickly and combine which undersampling method and classification model performs best. The results show that Naive Bayes (NB) after Edited Nearest Neighbors (ENN) has the best performance, with an F2-measure of 0.423. In addition, by changing the undersampling rate of the cluster centroid-based method, we find that the performance of the Linear Discriminant Analysis (LDA) and Naive Bayes (NB) are affected by the undersampling rate. Neither of them is uniformly declining, and LDA has higher performance when the undersampling rate is 30%. This study accordingly provides another perspective and a guide for future design.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Bankruptcy</subject><subject>Bayesian analysis</subject><subject>Biology and Life Sciences</subject><subject>Business failures</subject><subject>Centroids</subject><subject>Classification</subject><subject>Computer and Information Sciences</subject><subject>Datasets</subject><subject>Discriminant analysis</subject><subject>Economic impact</subject><subject>Engineering and Technology</subject><subject>Evaluation</subject><subject>Historical account</subject><subject>Information management</subject><subject>Learning algorithms</subject><subject>Literature reviews</subject><subject>Machine learning</subject><subject>Methods</subject><subject>Neural networks</subject><subject>Physical Sciences</subject><subject>Predictions</subject><subject>Research and Analysis Methods</subject><subject>Social Sciences</subject><subject>Support vector machines</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNkltrFDEUxwdR7EW_geCCIPVh19wmkygIpXhZKBS09TWcySS7WWeTMZlR--3NdkfZkT5IHhLO-eV_LvyL4hlGC0wr_HoThuihXXTBmwUiJUMUPSiOsaRkzgmiDw_eR8VJShuESio4f1wcUYZlVVJ6XLy98Y2JCbZd6_xqVoP_Foeu17ezLprG6d4F_2Z2De4n-MNsAz08KR5ZaJN5Ot6nxc2H99cXn-aXVx-XF-eXc8257OellVBhRCRQS8qKSwl1CaZqrG00shRwhZlG0hIjTKOx5kZY0KKUpakkq-hp8Xyv27UhqXHupPLIVcko4WUmlnuiCbBRXXRbiLcqgFN3gRBXCmLvdGuUyR0IQcvaasFYXdfSCCnqRhPCDJJ11no3Vhvqbe7H-D5COxGdZrxbq1X4oQRhggueBc5GgRi-Dyb1auuSNm0L3oThrm_BMEKUZvTFP-j9043UCvIAztuQ6-qdqDrnvEKCYL4ru7iHyqcxW6ezSazL8cmHV5MPmenNr34FQ0pq-eXz_7NXX6fsywN2baDt1ym0w85KaQqyPahjSCka-3fJGKmdx_9sQ-08rkaP0989qOzk</recordid><startdate>20210701</startdate><enddate>20210701</enddate><creator>Wang, Haoming</creator><creator>Liu, Xiangdong</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-3624-2127</orcidid></search><sort><creationdate>20210701</creationdate><title>Undersampling bankruptcy prediction: Taiwan bankruptcy data</title><author>Wang, Haoming ; Liu, Xiangdong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c669t-5f9a71029a3f257699ab5ae7dffdc0f3a1714c09f2e8edc1c6e8fac8595e79473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Bankruptcy</topic><topic>Bayesian analysis</topic><topic>Biology and Life Sciences</topic><topic>Business failures</topic><topic>Centroids</topic><topic>Classification</topic><topic>Computer and Information Sciences</topic><topic>Datasets</topic><topic>Discriminant analysis</topic><topic>Economic impact</topic><topic>Engineering and Technology</topic><topic>Evaluation</topic><topic>Historical account</topic><topic>Information management</topic><topic>Learning algorithms</topic><topic>Literature reviews</topic><topic>Machine learning</topic><topic>Methods</topic><topic>Neural networks</topic><topic>Physical Sciences</topic><topic>Predictions</topic><topic>Research and Analysis Methods</topic><topic>Social Sciences</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Haoming</creatorcontrib><creatorcontrib>Liu, Xiangdong</creatorcontrib><collection>CrossRef</collection><collection>Gale in Context : Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>ProQuest Nursing &amp; Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>ProQuest Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>ProQuest Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>https://resources.nclive.org/materials</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>Biological Sciences</collection><collection>Agriculture Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>ProQuest Biological Science Journals</collection><collection>Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials science collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Haoming</au><au>Liu, Xiangdong</au><au>Gadekallu, Thippa Reddy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Undersampling bankruptcy prediction: Taiwan bankruptcy data</atitle><jtitle>PloS one</jtitle><date>2021-07-01</date><risdate>2021</risdate><volume>16</volume><issue>7</issue><spage>e0254030</spage><epage>e0254030</epage><pages>e0254030-e0254030</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbalance problem, but little attention has been paid to the effect of the undersampling technology. Therefore, a framework is used to spot-check algorithms quickly and combine which undersampling method and classification model performs best. The results show that Naive Bayes (NB) after Edited Nearest Neighbors (ENN) has the best performance, with an F2-measure of 0.423. In addition, by changing the undersampling rate of the cluster centroid-based method, we find that the performance of the Linear Discriminant Analysis (LDA) and Naive Bayes (NB) are affected by the undersampling rate. Neither of them is uniformly declining, and LDA has higher performance when the undersampling rate is 30%. This study accordingly provides another perspective and a guide for future design.</abstract><cop>San Francisco</cop><pub>Public Library of Science</pub><pmid>34197533</pmid><doi>10.1371/journal.pone.0254030</doi><tpages>e0254030</tpages><orcidid>https://orcid.org/0000-0003-3624-2127</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2021-07, Vol.16 (7), p.e0254030-e0254030
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_2547543265
source Public Library of Science; Full-Text Journals in Chemistry (Open access); DOAJ Directory of Open Access Journals; PubMed Central; EZB Electronic Journals Library
subjects Algorithms
Analysis
Bankruptcy
Bayesian analysis
Biology and Life Sciences
Business failures
Centroids
Classification
Computer and Information Sciences
Datasets
Discriminant analysis
Economic impact
Engineering and Technology
Evaluation
Historical account
Information management
Learning algorithms
Literature reviews
Machine learning
Methods
Neural networks
Physical Sciences
Predictions
Research and Analysis Methods
Social Sciences
Support vector machines
title Undersampling bankruptcy prediction: Taiwan bankruptcy data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T16%3A36%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Undersampling%20bankruptcy%20prediction:%20Taiwan%20bankruptcy%20data&rft.jtitle=PloS%20one&rft.au=Wang,%20Haoming&rft.date=2021-07-01&rft.volume=16&rft.issue=7&rft.spage=e0254030&rft.epage=e0254030&rft.pages=e0254030-e0254030&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0254030&rft_dat=%3Cgale_plos_%3EA667082166%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2547543265&rft_id=info:pmid/34197533&rft_galeid=A667082166&rft_doaj_id=oai_doaj_org_article_e5768835bfc844bbb9e898bdc224e09b&rfr_iscdi=true