Undersampling bankruptcy prediction: Taiwan bankruptcy data
Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbal...
Gespeichert in:
Veröffentlicht in: | PloS one 2021-07, Vol.16 (7), p.e0254030-e0254030 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e0254030 |
---|---|
container_issue | 7 |
container_start_page | e0254030 |
container_title | PloS one |
container_volume | 16 |
creator | Wang, Haoming Liu, Xiangdong |
description | Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbalance problem, but little attention has been paid to the effect of the undersampling technology. Therefore, a framework is used to spot-check algorithms quickly and combine which undersampling method and classification model performs best. The results show that Naive Bayes (NB) after Edited Nearest Neighbors (ENN) has the best performance, with an F2-measure of 0.423. In addition, by changing the undersampling rate of the cluster centroid-based method, we find that the performance of the Linear Discriminant Analysis (LDA) and Naive Bayes (NB) are affected by the undersampling rate. Neither of them is uniformly declining, and LDA has higher performance when the undersampling rate is 30%. This study accordingly provides another perspective and a guide for future design. |
doi_str_mv | 10.1371/journal.pone.0254030 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2547543265</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A667082166</galeid><doaj_id>oai_doaj_org_article_e5768835bfc844bbb9e898bdc224e09b</doaj_id><sourcerecordid>A667082166</sourcerecordid><originalsourceid>FETCH-LOGICAL-c669t-5f9a71029a3f257699ab5ae7dffdc0f3a1714c09f2e8edc1c6e8fac8595e79473</originalsourceid><addsrcrecordid>eNqNkltrFDEUxwdR7EW_geCCIPVh19wmkygIpXhZKBS09TWcySS7WWeTMZlR--3NdkfZkT5IHhLO-eV_LvyL4hlGC0wr_HoThuihXXTBmwUiJUMUPSiOsaRkzgmiDw_eR8VJShuESio4f1wcUYZlVVJ6XLy98Y2JCbZd6_xqVoP_Foeu17ezLprG6d4F_2Z2De4n-MNsAz08KR5ZaJN5Ot6nxc2H99cXn-aXVx-XF-eXc8257OellVBhRCRQS8qKSwl1CaZqrG00shRwhZlG0hIjTKOx5kZY0KKUpakkq-hp8Xyv27UhqXHupPLIVcko4WUmlnuiCbBRXXRbiLcqgFN3gRBXCmLvdGuUyR0IQcvaasFYXdfSCCnqRhPCDJJ11no3Vhvqbe7H-D5COxGdZrxbq1X4oQRhggueBc5GgRi-Dyb1auuSNm0L3oThrm_BMEKUZvTFP-j9043UCvIAztuQ6-qdqDrnvEKCYL4ru7iHyqcxW6ezSazL8cmHV5MPmenNr34FQ0pq-eXz_7NXX6fsywN2baDt1ym0w85KaQqyPahjSCka-3fJGKmdx_9sQ-08rkaP0989qOzk</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2547543265</pqid></control><display><type>article</type><title>Undersampling bankruptcy prediction: Taiwan bankruptcy data</title><source>Public Library of Science</source><source>Full-Text Journals in Chemistry (Open access)</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central</source><source>EZB Electronic Journals Library</source><creator>Wang, Haoming ; Liu, Xiangdong</creator><contributor>Gadekallu, Thippa Reddy</contributor><creatorcontrib>Wang, Haoming ; Liu, Xiangdong ; Gadekallu, Thippa Reddy</creatorcontrib><description>Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbalance problem, but little attention has been paid to the effect of the undersampling technology. Therefore, a framework is used to spot-check algorithms quickly and combine which undersampling method and classification model performs best. The results show that Naive Bayes (NB) after Edited Nearest Neighbors (ENN) has the best performance, with an F2-measure of 0.423. In addition, by changing the undersampling rate of the cluster centroid-based method, we find that the performance of the Linear Discriminant Analysis (LDA) and Naive Bayes (NB) are affected by the undersampling rate. Neither of them is uniformly declining, and LDA has higher performance when the undersampling rate is 30%. This study accordingly provides another perspective and a guide for future design.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0254030</identifier><identifier>PMID: 34197533</identifier><language>eng</language><publisher>San Francisco: Public Library of Science</publisher><subject>Algorithms ; Analysis ; Bankruptcy ; Bayesian analysis ; Biology and Life Sciences ; Business failures ; Centroids ; Classification ; Computer and Information Sciences ; Datasets ; Discriminant analysis ; Economic impact ; Engineering and Technology ; Evaluation ; Historical account ; Information management ; Learning algorithms ; Literature reviews ; Machine learning ; Methods ; Neural networks ; Physical Sciences ; Predictions ; Research and Analysis Methods ; Social Sciences ; Support vector machines</subject><ispartof>PloS one, 2021-07, Vol.16 (7), p.e0254030-e0254030</ispartof><rights>COPYRIGHT 2021 Public Library of Science</rights><rights>2021 Wang, Liu. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Wang, Liu 2021 Wang, Liu</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c669t-5f9a71029a3f257699ab5ae7dffdc0f3a1714c09f2e8edc1c6e8fac8595e79473</citedby><cites>FETCH-LOGICAL-c669t-5f9a71029a3f257699ab5ae7dffdc0f3a1714c09f2e8edc1c6e8fac8595e79473</cites><orcidid>0000-0003-3624-2127</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248686/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248686/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,861,882,2096,2915,23847,27905,27906,53772,53774,79349,79350</link.rule.ids></links><search><contributor>Gadekallu, Thippa Reddy</contributor><creatorcontrib>Wang, Haoming</creatorcontrib><creatorcontrib>Liu, Xiangdong</creatorcontrib><title>Undersampling bankruptcy prediction: Taiwan bankruptcy data</title><title>PloS one</title><description>Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbalance problem, but little attention has been paid to the effect of the undersampling technology. Therefore, a framework is used to spot-check algorithms quickly and combine which undersampling method and classification model performs best. The results show that Naive Bayes (NB) after Edited Nearest Neighbors (ENN) has the best performance, with an F2-measure of 0.423. In addition, by changing the undersampling rate of the cluster centroid-based method, we find that the performance of the Linear Discriminant Analysis (LDA) and Naive Bayes (NB) are affected by the undersampling rate. Neither of them is uniformly declining, and LDA has higher performance when the undersampling rate is 30%. This study accordingly provides another perspective and a guide for future design.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Bankruptcy</subject><subject>Bayesian analysis</subject><subject>Biology and Life Sciences</subject><subject>Business failures</subject><subject>Centroids</subject><subject>Classification</subject><subject>Computer and Information Sciences</subject><subject>Datasets</subject><subject>Discriminant analysis</subject><subject>Economic impact</subject><subject>Engineering and Technology</subject><subject>Evaluation</subject><subject>Historical account</subject><subject>Information management</subject><subject>Learning algorithms</subject><subject>Literature reviews</subject><subject>Machine learning</subject><subject>Methods</subject><subject>Neural networks</subject><subject>Physical Sciences</subject><subject>Predictions</subject><subject>Research and Analysis Methods</subject><subject>Social Sciences</subject><subject>Support vector machines</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNkltrFDEUxwdR7EW_geCCIPVh19wmkygIpXhZKBS09TWcySS7WWeTMZlR--3NdkfZkT5IHhLO-eV_LvyL4hlGC0wr_HoThuihXXTBmwUiJUMUPSiOsaRkzgmiDw_eR8VJShuESio4f1wcUYZlVVJ6XLy98Y2JCbZd6_xqVoP_Foeu17ezLprG6d4F_2Z2De4n-MNsAz08KR5ZaJN5Ot6nxc2H99cXn-aXVx-XF-eXc8257OellVBhRCRQS8qKSwl1CaZqrG00shRwhZlG0hIjTKOx5kZY0KKUpakkq-hp8Xyv27UhqXHupPLIVcko4WUmlnuiCbBRXXRbiLcqgFN3gRBXCmLvdGuUyR0IQcvaasFYXdfSCCnqRhPCDJJ11no3Vhvqbe7H-D5COxGdZrxbq1X4oQRhggueBc5GgRi-Dyb1auuSNm0L3oThrm_BMEKUZvTFP-j9043UCvIAztuQ6-qdqDrnvEKCYL4ru7iHyqcxW6ezSazL8cmHV5MPmenNr34FQ0pq-eXz_7NXX6fsywN2baDt1ym0w85KaQqyPahjSCka-3fJGKmdx_9sQ-08rkaP0989qOzk</recordid><startdate>20210701</startdate><enddate>20210701</enddate><creator>Wang, Haoming</creator><creator>Liu, Xiangdong</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-3624-2127</orcidid></search><sort><creationdate>20210701</creationdate><title>Undersampling bankruptcy prediction: Taiwan bankruptcy data</title><author>Wang, Haoming ; Liu, Xiangdong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c669t-5f9a71029a3f257699ab5ae7dffdc0f3a1714c09f2e8edc1c6e8fac8595e79473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Bankruptcy</topic><topic>Bayesian analysis</topic><topic>Biology and Life Sciences</topic><topic>Business failures</topic><topic>Centroids</topic><topic>Classification</topic><topic>Computer and Information Sciences</topic><topic>Datasets</topic><topic>Discriminant analysis</topic><topic>Economic impact</topic><topic>Engineering and Technology</topic><topic>Evaluation</topic><topic>Historical account</topic><topic>Information management</topic><topic>Learning algorithms</topic><topic>Literature reviews</topic><topic>Machine learning</topic><topic>Methods</topic><topic>Neural networks</topic><topic>Physical Sciences</topic><topic>Predictions</topic><topic>Research and Analysis Methods</topic><topic>Social Sciences</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Haoming</creatorcontrib><creatorcontrib>Liu, Xiangdong</creatorcontrib><collection>CrossRef</collection><collection>Gale in Context : Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>ProQuest Nursing & Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>ProQuest Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>ProQuest Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Database (1962 - current)</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>https://resources.nclive.org/materials</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>Biological Sciences</collection><collection>Agriculture Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>ProQuest Biological Science Journals</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials science collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Haoming</au><au>Liu, Xiangdong</au><au>Gadekallu, Thippa Reddy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Undersampling bankruptcy prediction: Taiwan bankruptcy data</atitle><jtitle>PloS one</jtitle><date>2021-07-01</date><risdate>2021</risdate><volume>16</volume><issue>7</issue><spage>e0254030</spage><epage>e0254030</epage><pages>e0254030-e0254030</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbalance problem, but little attention has been paid to the effect of the undersampling technology. Therefore, a framework is used to spot-check algorithms quickly and combine which undersampling method and classification model performs best. The results show that Naive Bayes (NB) after Edited Nearest Neighbors (ENN) has the best performance, with an F2-measure of 0.423. In addition, by changing the undersampling rate of the cluster centroid-based method, we find that the performance of the Linear Discriminant Analysis (LDA) and Naive Bayes (NB) are affected by the undersampling rate. Neither of them is uniformly declining, and LDA has higher performance when the undersampling rate is 30%. This study accordingly provides another perspective and a guide for future design.</abstract><cop>San Francisco</cop><pub>Public Library of Science</pub><pmid>34197533</pmid><doi>10.1371/journal.pone.0254030</doi><tpages>e0254030</tpages><orcidid>https://orcid.org/0000-0003-3624-2127</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1932-6203 |
ispartof | PloS one, 2021-07, Vol.16 (7), p.e0254030-e0254030 |
issn | 1932-6203 1932-6203 |
language | eng |
recordid | cdi_plos_journals_2547543265 |
source | Public Library of Science; Full-Text Journals in Chemistry (Open access); DOAJ Directory of Open Access Journals; PubMed Central; EZB Electronic Journals Library |
subjects | Algorithms Analysis Bankruptcy Bayesian analysis Biology and Life Sciences Business failures Centroids Classification Computer and Information Sciences Datasets Discriminant analysis Economic impact Engineering and Technology Evaluation Historical account Information management Learning algorithms Literature reviews Machine learning Methods Neural networks Physical Sciences Predictions Research and Analysis Methods Social Sciences Support vector machines |
title | Undersampling bankruptcy prediction: Taiwan bankruptcy data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T16%3A36%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Undersampling%20bankruptcy%20prediction:%20Taiwan%20bankruptcy%20data&rft.jtitle=PloS%20one&rft.au=Wang,%20Haoming&rft.date=2021-07-01&rft.volume=16&rft.issue=7&rft.spage=e0254030&rft.epage=e0254030&rft.pages=e0254030-e0254030&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0254030&rft_dat=%3Cgale_plos_%3EA667082166%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2547543265&rft_id=info:pmid/34197533&rft_galeid=A667082166&rft_doaj_id=oai_doaj_org_article_e5768835bfc844bbb9e898bdc224e09b&rfr_iscdi=true |