Bankruptcy prediction using optimal ensemble models under balanced and imbalanced data

This study explores the performance of gradient boosting methods in bankruptcy prediction for a highly imbalanced dataset. We developed different heterogenous ensemble models based on three popular gradient boosting methods—XGBoost, LightGBM, and CatBoost. Our ensemble models were optimized using th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems 2024-08, Vol.41 (8), p.n/a
Hauptverfasser: Amirshahi, Bahareh, Lahmiri, Salim
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page n/a
container_issue 8
container_start_page
container_title Expert systems
container_volume 41
creator Amirshahi, Bahareh
Lahmiri, Salim
description This study explores the performance of gradient boosting methods in bankruptcy prediction for a highly imbalanced dataset. We developed different heterogenous ensemble models based on three popular gradient boosting methods—XGBoost, LightGBM, and CatBoost. Our ensemble models were optimized using the cross‐validation method and the results of the hold‐out test sets showed that the optimized ensemble models not only outperform their base learners, but also improve the state‐of‐the‐art benchmark results on the same dataset. Interestingly, we observed that the data oversampling technique that is commonly used to address the class imbalance issue had an adverse impact on our ensemble models' performance. This indicates that our models are robust to the imbalanced dataset problem that typically degrades the classification performance of machine learning models.
doi_str_mv 10.1111/exsy.13599
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3075437005</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3075437005</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2969-22c4e10f4a26eafbcb8d96808b746f11e7795ccf70e734cdeebbfd1cd41c07f23</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMoWFcv_oKAN6Fr0qZJe9Rl_YAFD36gp5AmE-napjVp0f57u1Y8Opdh4JkZ3gehU0qWdKoL-ArjkqZZUeyhiDKexyQt2D6KSMJ5zERCDtFRCFtCCBWCR-j5Srl3P3S9HnHnwVS6r1qHh1C5N9x2fdWoGoML0JQ14KY1UAc8OAMel6pWToPByhlcNX-jUb06RgdW1QFOfvsCPV2vH1e38eb-5m51uYl1UvAiThLNgBLLVMJB2VKXuSl4TvJSMG4pBSGKTGsrCIiUaQNQltZQbRjVRNgkXaCz-W7n248BQi-37eDd9FKmRGQsFYRkE3U-U9q3IXiwsvNTMD9KSuTOm9x5kz_eJpjO8GdVw_gPKdcvD6_zzjfKKXI6</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3075437005</pqid></control><display><type>article</type><title>Bankruptcy prediction using optimal ensemble models under balanced and imbalanced data</title><source>Access via Wiley Online Library</source><creator>Amirshahi, Bahareh ; Lahmiri, Salim</creator><creatorcontrib>Amirshahi, Bahareh ; Lahmiri, Salim</creatorcontrib><description>This study explores the performance of gradient boosting methods in bankruptcy prediction for a highly imbalanced dataset. We developed different heterogenous ensemble models based on three popular gradient boosting methods—XGBoost, LightGBM, and CatBoost. Our ensemble models were optimized using the cross‐validation method and the results of the hold‐out test sets showed that the optimized ensemble models not only outperform their base learners, but also improve the state‐of‐the‐art benchmark results on the same dataset. Interestingly, we observed that the data oversampling technique that is commonly used to address the class imbalance issue had an adverse impact on our ensemble models' performance. This indicates that our models are robust to the imbalanced dataset problem that typically degrades the classification performance of machine learning models.</description><identifier>ISSN: 0266-4720</identifier><identifier>EISSN: 1468-0394</identifier><identifier>DOI: 10.1111/exsy.13599</identifier><language>eng</language><publisher>Oxford: Blackwell Publishing Ltd</publisher><subject>Bankruptcy ; bankruptcy prediction ; Datasets ; gradient boosting methods ; imbalanced dataset ; Machine learning ; optimal ensemble models ; Oversampling</subject><ispartof>Expert systems, 2024-08, Vol.41 (8), p.n/a</ispartof><rights>2024 The Authors. published by John Wiley &amp; Sons Ltd.</rights><rights>2024. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2969-22c4e10f4a26eafbcb8d96808b746f11e7795ccf70e734cdeebbfd1cd41c07f23</cites><orcidid>0000-0002-9237-4100</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fexsy.13599$$EPDF$$P50$$Gwiley$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fexsy.13599$$EHTML$$P50$$Gwiley$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids></links><search><creatorcontrib>Amirshahi, Bahareh</creatorcontrib><creatorcontrib>Lahmiri, Salim</creatorcontrib><title>Bankruptcy prediction using optimal ensemble models under balanced and imbalanced data</title><title>Expert systems</title><description>This study explores the performance of gradient boosting methods in bankruptcy prediction for a highly imbalanced dataset. We developed different heterogenous ensemble models based on three popular gradient boosting methods—XGBoost, LightGBM, and CatBoost. Our ensemble models were optimized using the cross‐validation method and the results of the hold‐out test sets showed that the optimized ensemble models not only outperform their base learners, but also improve the state‐of‐the‐art benchmark results on the same dataset. Interestingly, we observed that the data oversampling technique that is commonly used to address the class imbalance issue had an adverse impact on our ensemble models' performance. This indicates that our models are robust to the imbalanced dataset problem that typically degrades the classification performance of machine learning models.</description><subject>Bankruptcy</subject><subject>bankruptcy prediction</subject><subject>Datasets</subject><subject>gradient boosting methods</subject><subject>imbalanced dataset</subject><subject>Machine learning</subject><subject>optimal ensemble models</subject><subject>Oversampling</subject><issn>0266-4720</issn><issn>1468-0394</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>24P</sourceid><sourceid>WIN</sourceid><recordid>eNp9kE1LxDAQhoMoWFcv_oKAN6Fr0qZJe9Rl_YAFD36gp5AmE-napjVp0f57u1Y8Opdh4JkZ3gehU0qWdKoL-ArjkqZZUeyhiDKexyQt2D6KSMJ5zERCDtFRCFtCCBWCR-j5Srl3P3S9HnHnwVS6r1qHh1C5N9x2fdWoGoML0JQ14KY1UAc8OAMel6pWToPByhlcNX-jUb06RgdW1QFOfvsCPV2vH1e38eb-5m51uYl1UvAiThLNgBLLVMJB2VKXuSl4TvJSMG4pBSGKTGsrCIiUaQNQltZQbRjVRNgkXaCz-W7n248BQi-37eDd9FKmRGQsFYRkE3U-U9q3IXiwsvNTMD9KSuTOm9x5kz_eJpjO8GdVw_gPKdcvD6_zzjfKKXI6</recordid><startdate>202408</startdate><enddate>202408</enddate><creator>Amirshahi, Bahareh</creator><creator>Lahmiri, Salim</creator><general>Blackwell Publishing Ltd</general><scope>24P</scope><scope>WIN</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-9237-4100</orcidid></search><sort><creationdate>202408</creationdate><title>Bankruptcy prediction using optimal ensemble models under balanced and imbalanced data</title><author>Amirshahi, Bahareh ; Lahmiri, Salim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2969-22c4e10f4a26eafbcb8d96808b746f11e7795ccf70e734cdeebbfd1cd41c07f23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Bankruptcy</topic><topic>bankruptcy prediction</topic><topic>Datasets</topic><topic>gradient boosting methods</topic><topic>imbalanced dataset</topic><topic>Machine learning</topic><topic>optimal ensemble models</topic><topic>Oversampling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Amirshahi, Bahareh</creatorcontrib><creatorcontrib>Lahmiri, Salim</creatorcontrib><collection>Wiley Online Library Open Access</collection><collection>Wiley Online Library (Open Access Collection)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Amirshahi, Bahareh</au><au>Lahmiri, Salim</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bankruptcy prediction using optimal ensemble models under balanced and imbalanced data</atitle><jtitle>Expert systems</jtitle><date>2024-08</date><risdate>2024</risdate><volume>41</volume><issue>8</issue><epage>n/a</epage><issn>0266-4720</issn><eissn>1468-0394</eissn><abstract>This study explores the performance of gradient boosting methods in bankruptcy prediction for a highly imbalanced dataset. We developed different heterogenous ensemble models based on three popular gradient boosting methods—XGBoost, LightGBM, and CatBoost. Our ensemble models were optimized using the cross‐validation method and the results of the hold‐out test sets showed that the optimized ensemble models not only outperform their base learners, but also improve the state‐of‐the‐art benchmark results on the same dataset. Interestingly, we observed that the data oversampling technique that is commonly used to address the class imbalance issue had an adverse impact on our ensemble models' performance. This indicates that our models are robust to the imbalanced dataset problem that typically degrades the classification performance of machine learning models.</abstract><cop>Oxford</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/exsy.13599</doi><tpages>25</tpages><orcidid>https://orcid.org/0000-0002-9237-4100</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0266-4720
ispartof Expert systems, 2024-08, Vol.41 (8), p.n/a
issn 0266-4720
1468-0394
language eng
recordid cdi_proquest_journals_3075437005
source Access via Wiley Online Library
subjects Bankruptcy
bankruptcy prediction
Datasets
gradient boosting methods
imbalanced dataset
Machine learning
optimal ensemble models
Oversampling
title Bankruptcy prediction using optimal ensemble models under balanced and imbalanced data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T05%3A14%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bankruptcy%20prediction%20using%20optimal%20ensemble%20models%20under%20balanced%20and%20imbalanced%20data&rft.jtitle=Expert%20systems&rft.au=Amirshahi,%20Bahareh&rft.date=2024-08&rft.volume=41&rft.issue=8&rft.epage=n/a&rft.issn=0266-4720&rft.eissn=1468-0394&rft_id=info:doi/10.1111/exsy.13599&rft_dat=%3Cproquest_cross%3E3075437005%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3075437005&rft_id=info:pmid/&rfr_iscdi=true