OUBoost: boosting based over and under sampling technique for handling imbalanced data

Most real-world datasets usually contain imbalanced data. Learning from datasets where the number of samples in one class (minority) is much smaller than in another class (majority) creates biased classifiers to the majority class. The overall prediction accuracy in imbalanced datasets is higher tha...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of machine learning and cybernetics 2023-10, Vol.14 (10), p.3393-3411
Hauptverfasser:	Mostafaei, Sahar Hassanzadeh, Tanha, Jafar
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Artificial Intelligence Classification Clustering Complex Systems Computational Intelligence Control Data integrity Datasets Engineering Learning Machine learning Mechatronics Methods Original Article Pattern Recognition Performance evaluation Robotics Sampling methods Sampling techniques Statistical tests Systems Biology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3411
container_issue	10
container_start_page	3393
container_title	International journal of machine learning and cybernetics
container_volume	14
creator	Mostafaei, Sahar Hassanzadeh Tanha, Jafar
description	Most real-world datasets usually contain imbalanced data. Learning from datasets where the number of samples in one class (minority) is much smaller than in another class (majority) creates biased classifiers to the majority class. The overall prediction accuracy in imbalanced datasets is higher than 90%, while this accuracy is relatively lower for minority classes. In this paper, we first propose a new technique for under-sampling based on the Peak clustering method from the majority class on imbalanced datasets. We then propose a novel boosting-based algorithm for learning from imbalanced datasets, based on a combination of the proposed Peak under-sampling algorithm and over-sampling technique (SMOTE) in the boosting procedure, named OUBoost. In the proposed OUBoost algorithm, misclassified examples are not given equal weights. OUBoost selects useful examples from the majority class and creates synthetic examples for the minority class. In fact, it indirectly updates the weights of samples. We designed experiments using several evaluation metrics, such as Recall, MCC, Gmean, and F-score on 30 real-world imbalanced datasets. The results show improved prediction performance in the minority class in most used datasets using OUBoost. We further report time comparisons and statistical tests to analyze our proposed algorithm in more details.
doi_str_mv	10.1007/s13042-023-01839-0
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2919481984</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2919481984</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-c082005aadd278a5d28deda59ff6f2416873f9dab66075665a06344f7ec5c12f3</originalsourceid><addsrcrecordid>eNp9UE1LAzEQDaJgqf0DngKeVyfJbjbxpsUvKPRixVuY3STtlna3JlvBf2_aFb05lzfMvPdmeIRcMrhmAOVNZAJyngEXGTAldAYnZMSUVJkC9X7625fsnExiXEMqCUIAH5G3-eK-62J_S6sDNO2SVhidpd2nCxRbS_etTV3E7W5z2PauXrXNx95R3wW6SozjuNlWuMG2TkqLPV6QM4-b6CY_OCaLx4fX6XM2mz-9TO9mWS2Y7rMaFAcoEK3lpcLCcmWdxUJ7Lz3PmVSl8NpiJSWUhZQFghR57ktXFzXjXozJ1eC7C136KfZm3e1Dm04arpnOFdMqTyw-sOrQxRicN7vQbDF8GQbmEKEZIjQpQnOM0EASiUEUE7lduvBn_Y_qGyrVc68</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2919481984</pqid></control><display><type>article</type><title>OUBoost: boosting based over and under sampling technique for handling imbalanced data</title><source>SpringerLink Journals - AutoHoldings</source><source>ProQuest Central</source><creator>Mostafaei, Sahar Hassanzadeh ; Tanha, Jafar</creator><creatorcontrib>Mostafaei, Sahar Hassanzadeh ; Tanha, Jafar</creatorcontrib><description>Most real-world datasets usually contain imbalanced data. Learning from datasets where the number of samples in one class (minority) is much smaller than in another class (majority) creates biased classifiers to the majority class. The overall prediction accuracy in imbalanced datasets is higher than 90%, while this accuracy is relatively lower for minority classes. In this paper, we first propose a new technique for under-sampling based on the Peak clustering method from the majority class on imbalanced datasets. We then propose a novel boosting-based algorithm for learning from imbalanced datasets, based on a combination of the proposed Peak under-sampling algorithm and over-sampling technique (SMOTE) in the boosting procedure, named OUBoost. In the proposed OUBoost algorithm, misclassified examples are not given equal weights. OUBoost selects useful examples from the majority class and creates synthetic examples for the minority class. In fact, it indirectly updates the weights of samples. We designed experiments using several evaluation metrics, such as Recall, MCC, Gmean, and F-score on 30 real-world imbalanced datasets. The results show improved prediction performance in the minority class in most used datasets using OUBoost. We further report time comparisons and statistical tests to analyze our proposed algorithm in more details.</description><identifier>ISSN: 1868-8071</identifier><identifier>EISSN: 1868-808X</identifier><identifier>DOI: 10.1007/s13042-023-01839-0</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Algorithms ; Artificial Intelligence ; Classification ; Clustering ; Complex Systems ; Computational Intelligence ; Control ; Data integrity ; Datasets ; Engineering ; Learning ; Machine learning ; Mechatronics ; Methods ; Original Article ; Pattern Recognition ; Performance evaluation ; Robotics ; Sampling methods ; Sampling techniques ; Statistical tests ; Systems Biology</subject><ispartof>International journal of machine learning and cybernetics, 2023-10, Vol.14 (10), p.3393-3411</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-c082005aadd278a5d28deda59ff6f2416873f9dab66075665a06344f7ec5c12f3</citedby><cites>FETCH-LOGICAL-c319t-c082005aadd278a5d28deda59ff6f2416873f9dab66075665a06344f7ec5c12f3</cites><orcidid>0000-0002-0779-6027</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s13042-023-01839-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2919481984?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21367,27901,27902,33721,41464,42533,43781,51294</link.rule.ids></links><search><creatorcontrib>Mostafaei, Sahar Hassanzadeh</creatorcontrib><creatorcontrib>Tanha, Jafar</creatorcontrib><title>OUBoost: boosting based over and under sampling technique for handling imbalanced data</title><title>International journal of machine learning and cybernetics</title><addtitle>Int. J. Mach. Learn. & Cyber</addtitle><description>Most real-world datasets usually contain imbalanced data. Learning from datasets where the number of samples in one class (minority) is much smaller than in another class (majority) creates biased classifiers to the majority class. The overall prediction accuracy in imbalanced datasets is higher than 90%, while this accuracy is relatively lower for minority classes. In this paper, we first propose a new technique for under-sampling based on the Peak clustering method from the majority class on imbalanced datasets. We then propose a novel boosting-based algorithm for learning from imbalanced datasets, based on a combination of the proposed Peak under-sampling algorithm and over-sampling technique (SMOTE) in the boosting procedure, named OUBoost. In the proposed OUBoost algorithm, misclassified examples are not given equal weights. OUBoost selects useful examples from the majority class and creates synthetic examples for the minority class. In fact, it indirectly updates the weights of samples. We designed experiments using several evaluation metrics, such as Recall, MCC, Gmean, and F-score on 30 real-world imbalanced datasets. The results show improved prediction performance in the minority class in most used datasets using OUBoost. We further report time comparisons and statistical tests to analyze our proposed algorithm in more details.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Clustering</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Control</subject><subject>Data integrity</subject><subject>Datasets</subject><subject>Engineering</subject><subject>Learning</subject><subject>Machine learning</subject><subject>Mechatronics</subject><subject>Methods</subject><subject>Original Article</subject><subject>Pattern Recognition</subject><subject>Performance evaluation</subject><subject>Robotics</subject><subject>Sampling methods</subject><subject>Sampling techniques</subject><subject>Statistical tests</subject><subject>Systems Biology</subject><issn>1868-8071</issn><issn>1868-808X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9UE1LAzEQDaJgqf0DngKeVyfJbjbxpsUvKPRixVuY3STtlna3JlvBf2_aFb05lzfMvPdmeIRcMrhmAOVNZAJyngEXGTAldAYnZMSUVJkC9X7625fsnExiXEMqCUIAH5G3-eK-62J_S6sDNO2SVhidpd2nCxRbS_etTV3E7W5z2PauXrXNx95R3wW6SozjuNlWuMG2TkqLPV6QM4-b6CY_OCaLx4fX6XM2mz-9TO9mWS2Y7rMaFAcoEK3lpcLCcmWdxUJ7Lz3PmVSl8NpiJSWUhZQFghR57ktXFzXjXozJ1eC7C136KfZm3e1Dm04arpnOFdMqTyw-sOrQxRicN7vQbDF8GQbmEKEZIjQpQnOM0EASiUEUE7lduvBn_Y_qGyrVc68</recordid><startdate>20231001</startdate><enddate>20231001</enddate><creator>Mostafaei, Sahar Hassanzadeh</creator><creator>Tanha, Jafar</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><orcidid>https://orcid.org/0000-0002-0779-6027</orcidid></search><sort><creationdate>20231001</creationdate><title>OUBoost: boosting based over and under sampling technique for handling imbalanced data</title><author>Mostafaei, Sahar Hassanzadeh ; Tanha, Jafar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-c082005aadd278a5d28deda59ff6f2416873f9dab66075665a06344f7ec5c12f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Clustering</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Control</topic><topic>Data integrity</topic><topic>Datasets</topic><topic>Engineering</topic><topic>Learning</topic><topic>Machine learning</topic><topic>Mechatronics</topic><topic>Methods</topic><topic>Original Article</topic><topic>Pattern Recognition</topic><topic>Performance evaluation</topic><topic>Robotics</topic><topic>Sampling methods</topic><topic>Sampling techniques</topic><topic>Statistical tests</topic><topic>Systems Biology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mostafaei, Sahar Hassanzadeh</creatorcontrib><creatorcontrib>Tanha, Jafar</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><jtitle>International journal of machine learning and cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mostafaei, Sahar Hassanzadeh</au><au>Tanha, Jafar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>OUBoost: boosting based over and under sampling technique for handling imbalanced data</atitle><jtitle>International journal of machine learning and cybernetics</jtitle><stitle>Int. J. Mach. Learn. & Cyber</stitle><date>2023-10-01</date><risdate>2023</risdate><volume>14</volume><issue>10</issue><spage>3393</spage><epage>3411</epage><pages>3393-3411</pages><issn>1868-8071</issn><eissn>1868-808X</eissn><abstract>Most real-world datasets usually contain imbalanced data. Learning from datasets where the number of samples in one class (minority) is much smaller than in another class (majority) creates biased classifiers to the majority class. The overall prediction accuracy in imbalanced datasets is higher than 90%, while this accuracy is relatively lower for minority classes. In this paper, we first propose a new technique for under-sampling based on the Peak clustering method from the majority class on imbalanced datasets. We then propose a novel boosting-based algorithm for learning from imbalanced datasets, based on a combination of the proposed Peak under-sampling algorithm and over-sampling technique (SMOTE) in the boosting procedure, named OUBoost. In the proposed OUBoost algorithm, misclassified examples are not given equal weights. OUBoost selects useful examples from the majority class and creates synthetic examples for the minority class. In fact, it indirectly updates the weights of samples. We designed experiments using several evaluation metrics, such as Recall, MCC, Gmean, and F-score on 30 real-world imbalanced datasets. The results show improved prediction performance in the minority class in most used datasets using OUBoost. We further report time comparisons and statistical tests to analyze our proposed algorithm in more details.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s13042-023-01839-0</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0002-0779-6027</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1868-8071
ispartof	International journal of machine learning and cybernetics, 2023-10, Vol.14 (10), p.3393-3411
issn	1868-8071 1868-808X
language	eng
recordid	cdi_proquest_journals_2919481984
source	SpringerLink Journals - AutoHoldings; ProQuest Central
subjects	Accuracy Algorithms Artificial Intelligence Classification Clustering Complex Systems Computational Intelligence Control Data integrity Datasets Engineering Learning Machine learning Mechatronics Methods Original Article Pattern Recognition Performance evaluation Robotics Sampling methods Sampling techniques Statistical tests Systems Biology
title	OUBoost: boosting based over and under sampling technique for handling imbalanced data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T19%3A28%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=OUBoost:%20boosting%20based%20over%20and%20under%20sampling%20technique%20for%20handling%20imbalanced%20data&rft.jtitle=International%20journal%20of%20machine%20learning%20and%20cybernetics&rft.au=Mostafaei,%20Sahar%20Hassanzadeh&rft.date=2023-10-01&rft.volume=14&rft.issue=10&rft.spage=3393&rft.epage=3411&rft.pages=3393-3411&rft.issn=1868-8071&rft.eissn=1868-808X&rft_id=info:doi/10.1007/s13042-023-01839-0&rft_dat=%3Cproquest_cross%3E2919481984%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2919481984&rft_id=info:pmid/&rfr_iscdi=true