Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection

Credit card fraud (CCF) is a significant threat to cardholders and financial institutions. CCF detection against this threat is challenging due to extremely imbalanced data (EID). EID involves extremely few instances of fraud for training and an extremely high risk of overlooking fraud. While class...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.159316-159335
Hauptverfasser:	Zhao, Xiaosong, Liu, Yong, Zhao, Qiangfu
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Boosting Class balancing cost-harmonization LightGBM Classification algorithms cost-sensitive Costs credit card fraud detection Credit cards Data models extremely imbalanced data Fraud interpretability Loss measurement oversampling Synthetic data Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	159335
container_issue
container_start_page	159316
container_title	IEEE access
container_volume	12
creator	Zhao, Xiaosong Liu, Yong Zhao, Qiangfu
description	Credit card fraud (CCF) is a significant threat to cardholders and financial institutions. CCF detection against this threat is challenging due to extremely imbalanced data (EID). EID involves extremely few instances of fraud for training and an extremely high risk of overlooking fraud. While class balancing or oversampling techniques can address the former problem by punishing negative classes or augmenting the positive data, they do not mitigate the latter. In contrast, the cost-sensitive learning approach targets only the high risk of false negative errors. Therefore, existing approaches are insufficient to solve all the issues of the EID problem. Based on the LightGBM (Light Gradient Boosting Machine) framework, this study introduces two novel machine-learning methods: the class balancing cost-harmonization LightGBM (CB-CHL-LightGBM) and the oversampling cost-harmonization LightGBM (OS-CHL-LightGBM). The new approaches combine class balancing or oversampling technology with LightGBM to solve the EID problem comprehensively. They enhance the efficacy of LightGBM in CCF detection scenarios. Experimental results on three CCF datasets indicate that the two proposed methods outperform LightGBM in several crucial performance metrics. For example, compared with the original LightGBM, CB-CHL-LightGBM or OS-CHL-LightGBM can increase the F2-score from 0.77 to 0.83 for the first dataset, from 0.77 to 0.86 for the second dataset, and from 0.70 to 0.82 for the third dataset. However, adding class balancing, oversampling, and cost-harmonization loss separately to LightGBM may not obtain better results.
doi_str_mv	10.1109/ACCESS.2024.3487212
format	Article
fullrecord	<record><control><sourceid>doaj_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2024_3487212</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10737073</ieee_id><doaj_id>oai_doaj_org_article_2e03618967114c458023b8752332785f</doaj_id><sourcerecordid>oai_doaj_org_article_2e03618967114c458023b8752332785f</sourcerecordid><originalsourceid>FETCH-LOGICAL-c216t-f1b3e2939a0bc4d15c1d487350dc2e46c941f482d941bb78368382cdbba28503</originalsourceid><addsrcrecordid>eNpNUMFOwzAMjRBITLAvgEN-oCOJ2yY9jrKNSkMctjNRmqSjU7tWaUDs78nohGbJsvXs92Q_hB4omVFKsqd5ni82mxkjLJ5BLDij7ApNGE2zCBJIry_6WzQdhj0JIQKU8An6KNredd_W4HW9-_Sr5zdcdQ4vfryzrW2OuGhL1aiDDhsvyiusDgbP-76ptfJ1d8C-w7mzpvY4V87gpVNfYdN6q0_je3RTqWaw03O9Q9vlYpu_Ruv3VZHP15EOt_mooiVYlkGmSKljQxNNTfgEEmI0s3Gqs5hWsWAm1LLkAlIBgmlTloqJhMAdKkZZ06m97F3dKneUnarlH9C5nVTO17qxklkCKRVZyimNdZwIwqAUPGEAjIukClowamnXDYOz1b8eJfJkuBwNlyfD5dnwwHocWbW19oLBgYeEXyPnek0</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Zhao, Xiaosong ; Liu, Yong ; Zhao, Qiangfu</creator><creatorcontrib>Zhao, Xiaosong ; Liu, Yong ; Zhao, Qiangfu</creatorcontrib><description>Credit card fraud (CCF) is a significant threat to cardholders and financial institutions. CCF detection against this threat is challenging due to extremely imbalanced data (EID). EID involves extremely few instances of fraud for training and an extremely high risk of overlooking fraud. While class balancing or oversampling techniques can address the former problem by punishing negative classes or augmenting the positive data, they do not mitigate the latter. In contrast, the cost-sensitive learning approach targets only the high risk of false negative errors. Therefore, existing approaches are insufficient to solve all the issues of the EID problem. Based on the LightGBM (Light Gradient Boosting Machine) framework, this study introduces two novel machine-learning methods: the class balancing cost-harmonization LightGBM (CB-CHL-LightGBM) and the oversampling cost-harmonization LightGBM (OS-CHL-LightGBM). The new approaches combine class balancing or oversampling technology with LightGBM to solve the EID problem comprehensively. They enhance the efficacy of LightGBM in CCF detection scenarios. Experimental results on three CCF datasets indicate that the two proposed methods outperform LightGBM in several crucial performance metrics. For example, compared with the original LightGBM, CB-CHL-LightGBM or OS-CHL-LightGBM can increase the F2-score from 0.77 to 0.83 for the first dataset, from 0.77 to 0.86 for the second dataset, and from 0.70 to 0.82 for the third dataset. However, adding class balancing, oversampling, and cost-harmonization loss separately to LightGBM may not obtain better results.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3487212</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Boosting ; Class balancing cost-harmonization LightGBM ; Classification algorithms ; cost-sensitive ; Costs ; credit card fraud detection ; Credit cards ; Data models ; extremely imbalanced data ; Fraud ; interpretability ; Loss measurement ; oversampling ; Synthetic data ; Training</subject><ispartof>IEEE access, 2024, Vol.12, p.159316-159335</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c216t-f1b3e2939a0bc4d15c1d487350dc2e46c941f482d941bb78368382cdbba28503</cites><orcidid>0000-0002-7146-254X ; 0000-0002-4663-6739 ; 0000-0003-3101-749X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10737073$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2100,4022,27632,27922,27923,27924,54932</link.rule.ids></links><search><creatorcontrib>Zhao, Xiaosong</creatorcontrib><creatorcontrib>Liu, Yong</creatorcontrib><creatorcontrib>Zhao, Qiangfu</creatorcontrib><title>Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection</title><title>IEEE access</title><addtitle>Access</addtitle><description>Credit card fraud (CCF) is a significant threat to cardholders and financial institutions. CCF detection against this threat is challenging due to extremely imbalanced data (EID). EID involves extremely few instances of fraud for training and an extremely high risk of overlooking fraud. While class balancing or oversampling techniques can address the former problem by punishing negative classes or augmenting the positive data, they do not mitigate the latter. In contrast, the cost-sensitive learning approach targets only the high risk of false negative errors. Therefore, existing approaches are insufficient to solve all the issues of the EID problem. Based on the LightGBM (Light Gradient Boosting Machine) framework, this study introduces two novel machine-learning methods: the class balancing cost-harmonization LightGBM (CB-CHL-LightGBM) and the oversampling cost-harmonization LightGBM (OS-CHL-LightGBM). The new approaches combine class balancing or oversampling technology with LightGBM to solve the EID problem comprehensively. They enhance the efficacy of LightGBM in CCF detection scenarios. Experimental results on three CCF datasets indicate that the two proposed methods outperform LightGBM in several crucial performance metrics. For example, compared with the original LightGBM, CB-CHL-LightGBM or OS-CHL-LightGBM can increase the F2-score from 0.77 to 0.83 for the first dataset, from 0.77 to 0.86 for the second dataset, and from 0.70 to 0.82 for the third dataset. However, adding class balancing, oversampling, and cost-harmonization loss separately to LightGBM may not obtain better results.</description><subject>Accuracy</subject><subject>Boosting</subject><subject>Class balancing cost-harmonization LightGBM</subject><subject>Classification algorithms</subject><subject>cost-sensitive</subject><subject>Costs</subject><subject>credit card fraud detection</subject><subject>Credit cards</subject><subject>Data models</subject><subject>extremely imbalanced data</subject><subject>Fraud</subject><subject>interpretability</subject><subject>Loss measurement</subject><subject>oversampling</subject><subject>Synthetic data</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUMFOwzAMjRBITLAvgEN-oCOJ2yY9jrKNSkMctjNRmqSjU7tWaUDs78nohGbJsvXs92Q_hB4omVFKsqd5ni82mxkjLJ5BLDij7ApNGE2zCBJIry_6WzQdhj0JIQKU8An6KNredd_W4HW9-_Sr5zdcdQ4vfryzrW2OuGhL1aiDDhsvyiusDgbP-76ptfJ1d8C-w7mzpvY4V87gpVNfYdN6q0_je3RTqWaw03O9Q9vlYpu_Ruv3VZHP15EOt_mooiVYlkGmSKljQxNNTfgEEmI0s3Gqs5hWsWAm1LLkAlIBgmlTloqJhMAdKkZZ06m97F3dKneUnarlH9C5nVTO17qxklkCKRVZyimNdZwIwqAUPGEAjIukClowamnXDYOz1b8eJfJkuBwNlyfD5dnwwHocWbW19oLBgYeEXyPnek0</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Zhao, Xiaosong</creator><creator>Liu, Yong</creator><creator>Zhao, Qiangfu</creator><general>IEEE</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-7146-254X</orcidid><orcidid>https://orcid.org/0000-0002-4663-6739</orcidid><orcidid>https://orcid.org/0000-0003-3101-749X</orcidid></search><sort><creationdate>2024</creationdate><title>Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection</title><author>Zhao, Xiaosong ; Liu, Yong ; Zhao, Qiangfu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c216t-f1b3e2939a0bc4d15c1d487350dc2e46c941f482d941bb78368382cdbba28503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Boosting</topic><topic>Class balancing cost-harmonization LightGBM</topic><topic>Classification algorithms</topic><topic>cost-sensitive</topic><topic>Costs</topic><topic>credit card fraud detection</topic><topic>Credit cards</topic><topic>Data models</topic><topic>extremely imbalanced data</topic><topic>Fraud</topic><topic>interpretability</topic><topic>Loss measurement</topic><topic>oversampling</topic><topic>Synthetic data</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Xiaosong</creatorcontrib><creatorcontrib>Liu, Yong</creatorcontrib><creatorcontrib>Zhao, Qiangfu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhao, Xiaosong</au><au>Liu, Yong</au><au>Zhao, Qiangfu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>159316</spage><epage>159335</epage><pages>159316-159335</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Credit card fraud (CCF) is a significant threat to cardholders and financial institutions. CCF detection against this threat is challenging due to extremely imbalanced data (EID). EID involves extremely few instances of fraud for training and an extremely high risk of overlooking fraud. While class balancing or oversampling techniques can address the former problem by punishing negative classes or augmenting the positive data, they do not mitigate the latter. In contrast, the cost-sensitive learning approach targets only the high risk of false negative errors. Therefore, existing approaches are insufficient to solve all the issues of the EID problem. Based on the LightGBM (Light Gradient Boosting Machine) framework, this study introduces two novel machine-learning methods: the class balancing cost-harmonization LightGBM (CB-CHL-LightGBM) and the oversampling cost-harmonization LightGBM (OS-CHL-LightGBM). The new approaches combine class balancing or oversampling technology with LightGBM to solve the EID problem comprehensively. They enhance the efficacy of LightGBM in CCF detection scenarios. Experimental results on three CCF datasets indicate that the two proposed methods outperform LightGBM in several crucial performance metrics. For example, compared with the original LightGBM, CB-CHL-LightGBM or OS-CHL-LightGBM can increase the F2-score from 0.77 to 0.83 for the first dataset, from 0.77 to 0.86 for the second dataset, and from 0.70 to 0.82 for the third dataset. However, adding class balancing, oversampling, and cost-harmonization loss separately to LightGBM may not obtain better results.</abstract><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3487212</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0002-7146-254X</orcidid><orcidid>https://orcid.org/0000-0002-4663-6739</orcidid><orcidid>https://orcid.org/0000-0003-3101-749X</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2024, Vol.12, p.159316-159335
issn	2169-3536 2169-3536
language	eng
recordid	cdi_crossref_primary_10_1109_ACCESS_2024_3487212
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Accuracy Boosting Class balancing cost-harmonization LightGBM Classification algorithms cost-sensitive Costs credit card fraud detection Credit cards Data models extremely imbalanced data Fraud interpretability Loss measurement oversampling Synthetic data Training
title	Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T19%3A54%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-doaj_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improved%20LightGBM%20for%20Extremely%20Imbalanced%20Data%20and%20Application%20to%20Credit%20Card%20Fraud%20Detection&rft.jtitle=IEEE%20access&rft.au=Zhao,%20Xiaosong&rft.date=2024&rft.volume=12&rft.spage=159316&rft.epage=159335&rft.pages=159316-159335&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3487212&rft_dat=%3Cdoaj_cross%3Eoai_doaj_org_article_2e03618967114c458023b8752332785f%3C/doaj_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10737073&rft_doaj_id=oai_doaj_org_article_2e03618967114c458023b8752332785f&rfr_iscdi=true