OUBoost: boosting based over and under sampling technique for handling imbalanced data
Most real-world datasets usually contain imbalanced data. Learning from datasets where the number of samples in one class (minority) is much smaller than in another class (majority) creates biased classifiers to the majority class. The overall prediction accuracy in imbalanced datasets is higher tha...
Gespeichert in:
Veröffentlicht in: | International journal of machine learning and cybernetics 2023-10, Vol.14 (10), p.3393-3411 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3411 |
---|---|
container_issue | 10 |
container_start_page | 3393 |
container_title | International journal of machine learning and cybernetics |
container_volume | 14 |
creator | Mostafaei, Sahar Hassanzadeh Tanha, Jafar |
description | Most real-world datasets usually contain imbalanced data. Learning from datasets where the number of samples in one class (minority) is much smaller than in another class (majority) creates biased classifiers to the majority class. The overall prediction accuracy in imbalanced datasets is higher than 90%, while this accuracy is relatively lower for minority classes. In this paper, we first propose a new technique for under-sampling based on the Peak clustering method from the majority class on imbalanced datasets. We then propose a novel boosting-based algorithm for learning from imbalanced datasets, based on a combination of the proposed Peak under-sampling algorithm and over-sampling technique (SMOTE) in the boosting procedure, named OUBoost. In the proposed OUBoost algorithm, misclassified examples are not given equal weights. OUBoost selects useful examples from the majority class and creates synthetic examples for the minority class. In fact, it indirectly updates the weights of samples. We designed experiments using several evaluation metrics, such as Recall, MCC, Gmean, and F-score on 30 real-world imbalanced datasets. The results show improved prediction performance in the minority class in most used datasets using OUBoost. We further report time comparisons and statistical tests to analyze our proposed algorithm in more details. |
doi_str_mv | 10.1007/s13042-023-01839-0 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2919481984</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2919481984</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-c082005aadd278a5d28deda59ff6f2416873f9dab66075665a06344f7ec5c12f3</originalsourceid><addsrcrecordid>eNp9UE1LAzEQDaJgqf0DngKeVyfJbjbxpsUvKPRixVuY3STtlna3JlvBf2_aFb05lzfMvPdmeIRcMrhmAOVNZAJyngEXGTAldAYnZMSUVJkC9X7625fsnExiXEMqCUIAH5G3-eK-62J_S6sDNO2SVhidpd2nCxRbS_etTV3E7W5z2PauXrXNx95R3wW6SozjuNlWuMG2TkqLPV6QM4-b6CY_OCaLx4fX6XM2mz-9TO9mWS2Y7rMaFAcoEK3lpcLCcmWdxUJ7Lz3PmVSl8NpiJSWUhZQFghR57ktXFzXjXozJ1eC7C136KfZm3e1Dm04arpnOFdMqTyw-sOrQxRicN7vQbDF8GQbmEKEZIjQpQnOM0EASiUEUE7lduvBn_Y_qGyrVc68</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2919481984</pqid></control><display><type>article</type><title>OUBoost: boosting based over and under sampling technique for handling imbalanced data</title><source>SpringerLink Journals - AutoHoldings</source><source>ProQuest Central</source><creator>Mostafaei, Sahar Hassanzadeh ; Tanha, Jafar</creator><creatorcontrib>Mostafaei, Sahar Hassanzadeh ; Tanha, Jafar</creatorcontrib><description>Most real-world datasets usually contain imbalanced data. Learning from datasets where the number of samples in one class (minority) is much smaller than in another class (majority) creates biased classifiers to the majority class. The overall prediction accuracy in imbalanced datasets is higher than 90%, while this accuracy is relatively lower for minority classes. In this paper, we first propose a new technique for under-sampling based on the Peak clustering method from the majority class on imbalanced datasets. We then propose a novel boosting-based algorithm for learning from imbalanced datasets, based on a combination of the proposed Peak under-sampling algorithm and over-sampling technique (SMOTE) in the boosting procedure, named OUBoost. In the proposed OUBoost algorithm, misclassified examples are not given equal weights. OUBoost selects useful examples from the majority class and creates synthetic examples for the minority class. In fact, it indirectly updates the weights of samples. We designed experiments using several evaluation metrics, such as Recall, MCC, Gmean, and F-score on 30 real-world imbalanced datasets. The results show improved prediction performance in the minority class in most used datasets using OUBoost. We further report time comparisons and statistical tests to analyze our proposed algorithm in more details.</description><identifier>ISSN: 1868-8071</identifier><identifier>EISSN: 1868-808X</identifier><identifier>DOI: 10.1007/s13042-023-01839-0</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Algorithms ; Artificial Intelligence ; Classification ; Clustering ; Complex Systems ; Computational Intelligence ; Control ; Data integrity ; Datasets ; Engineering ; Learning ; Machine learning ; Mechatronics ; Methods ; Original Article ; Pattern Recognition ; Performance evaluation ; Robotics ; Sampling methods ; Sampling techniques ; Statistical tests ; Systems Biology</subject><ispartof>International journal of machine learning and cybernetics, 2023-10, Vol.14 (10), p.3393-3411</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-c082005aadd278a5d28deda59ff6f2416873f9dab66075665a06344f7ec5c12f3</citedby><cites>FETCH-LOGICAL-c319t-c082005aadd278a5d28deda59ff6f2416873f9dab66075665a06344f7ec5c12f3</cites><orcidid>0000-0002-0779-6027</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s13042-023-01839-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2919481984?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21367,27901,27902,33721,41464,42533,43781,51294</link.rule.ids></links><search><creatorcontrib>Mostafaei, Sahar Hassanzadeh</creatorcontrib><creatorcontrib>Tanha, Jafar</creatorcontrib><title>OUBoost: boosting based over and under sampling technique for handling imbalanced data</title><title>International journal of machine learning and cybernetics</title><addtitle>Int. J. Mach. Learn. & Cyber</addtitle><description>Most real-world datasets usually contain imbalanced data. Learning from datasets where the number of samples in one class (minority) is much smaller than in another class (majority) creates biased classifiers to the majority class. The overall prediction accuracy in imbalanced datasets is higher than 90%, while this accuracy is relatively lower for minority classes. In this paper, we first propose a new technique for under-sampling based on the Peak clustering method from the majority class on imbalanced datasets. We then propose a novel boosting-based algorithm for learning from imbalanced datasets, based on a combination of the proposed Peak under-sampling algorithm and over-sampling technique (SMOTE) in the boosting procedure, named OUBoost. In the proposed OUBoost algorithm, misclassified examples are not given equal weights. OUBoost selects useful examples from the majority class and creates synthetic examples for the minority class. In fact, it indirectly updates the weights of samples. We designed experiments using several evaluation metrics, such as Recall, MCC, Gmean, and F-score on 30 real-world imbalanced datasets. The results show improved prediction performance in the minority class in most used datasets using OUBoost. We further report time comparisons and statistical tests to analyze our proposed algorithm in more details.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Clustering</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Control</subject><subject>Data integrity</subject><subject>Datasets</subject><subject>Engineering</subject><subject>Learning</subject><subject>Machine learning</subject><subject>Mechatronics</subject><subject>Methods</subject><subject>Original Article</subject><subject>Pattern Recognition</subject><subject>Performance evaluation</subject><subject>Robotics</subject><subject>Sampling methods</subject><subject>Sampling techniques</subject><subject>Statistical tests</subject><subject>Systems Biology</subject><issn>1868-8071</issn><issn>1868-808X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9UE1LAzEQDaJgqf0DngKeVyfJbjbxpsUvKPRixVuY3STtlna3JlvBf2_aFb05lzfMvPdmeIRcMrhmAOVNZAJyngEXGTAldAYnZMSUVJkC9X7625fsnExiXEMqCUIAH5G3-eK-62J_S6sDNO2SVhidpd2nCxRbS_etTV3E7W5z2PauXrXNx95R3wW6SozjuNlWuMG2TkqLPV6QM4-b6CY_OCaLx4fX6XM2mz-9TO9mWS2Y7rMaFAcoEK3lpcLCcmWdxUJ7Lz3PmVSl8NpiJSWUhZQFghR57ktXFzXjXozJ1eC7C136KfZm3e1Dm04arpnOFdMqTyw-sOrQxRicN7vQbDF8GQbmEKEZIjQpQnOM0EASiUEUE7lduvBn_Y_qGyrVc68</recordid><startdate>20231001</startdate><enddate>20231001</enddate><creator>Mostafaei, Sahar Hassanzadeh</creator><creator>Tanha, Jafar</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><orcidid>https://orcid.org/0000-0002-0779-6027</orcidid></search><sort><creationdate>20231001</creationdate><title>OUBoost: boosting based over and under sampling technique for handling imbalanced data</title><author>Mostafaei, Sahar Hassanzadeh ; Tanha, Jafar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-c082005aadd278a5d28deda59ff6f2416873f9dab66075665a06344f7ec5c12f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Clustering</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Control</topic><topic>Data integrity</topic><topic>Datasets</topic><topic>Engineering</topic><topic>Learning</topic><topic>Machine learning</topic><topic>Mechatronics</topic><topic>Methods</topic><topic>Original Article</topic><topic>Pattern Recognition</topic><topic>Performance evaluation</topic><topic>Robotics</topic><topic>Sampling methods</topic><topic>Sampling techniques</topic><topic>Statistical tests</topic><topic>Systems Biology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mostafaei, Sahar Hassanzadeh</creatorcontrib><creatorcontrib>Tanha, Jafar</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><jtitle>International journal of machine learning and cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mostafaei, Sahar Hassanzadeh</au><au>Tanha, Jafar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>OUBoost: boosting based over and under sampling technique for handling imbalanced data</atitle><jtitle>International journal of machine learning and cybernetics</jtitle><stitle>Int. J. Mach. Learn. & Cyber</stitle><date>2023-10-01</date><risdate>2023</risdate><volume>14</volume><issue>10</issue><spage>3393</spage><epage>3411</epage><pages>3393-3411</pages><issn>1868-8071</issn><eissn>1868-808X</eissn><abstract>Most real-world datasets usually contain imbalanced data. Learning from datasets where the number of samples in one class (minority) is much smaller than in another class (majority) creates biased classifiers to the majority class. The overall prediction accuracy in imbalanced datasets is higher than 90%, while this accuracy is relatively lower for minority classes. In this paper, we first propose a new technique for under-sampling based on the Peak clustering method from the majority class on imbalanced datasets. We then propose a novel boosting-based algorithm for learning from imbalanced datasets, based on a combination of the proposed Peak under-sampling algorithm and over-sampling technique (SMOTE) in the boosting procedure, named OUBoost. In the proposed OUBoost algorithm, misclassified examples are not given equal weights. OUBoost selects useful examples from the majority class and creates synthetic examples for the minority class. In fact, it indirectly updates the weights of samples. We designed experiments using several evaluation metrics, such as Recall, MCC, Gmean, and F-score on 30 real-world imbalanced datasets. The results show improved prediction performance in the minority class in most used datasets using OUBoost. We further report time comparisons and statistical tests to analyze our proposed algorithm in more details.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s13042-023-01839-0</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0002-0779-6027</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1868-8071 |
ispartof | International journal of machine learning and cybernetics, 2023-10, Vol.14 (10), p.3393-3411 |
issn | 1868-8071 1868-808X |
language | eng |
recordid | cdi_proquest_journals_2919481984 |
source | SpringerLink Journals - AutoHoldings; ProQuest Central |
subjects | Accuracy Algorithms Artificial Intelligence Classification Clustering Complex Systems Computational Intelligence Control Data integrity Datasets Engineering Learning Machine learning Mechatronics Methods Original Article Pattern Recognition Performance evaluation Robotics Sampling methods Sampling techniques Statistical tests Systems Biology |
title | OUBoost: boosting based over and under sampling technique for handling imbalanced data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T19%3A28%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=OUBoost:%20boosting%20based%20over%20and%20under%20sampling%20technique%20for%20handling%20imbalanced%20data&rft.jtitle=International%20journal%20of%20machine%20learning%20and%20cybernetics&rft.au=Mostafaei,%20Sahar%20Hassanzadeh&rft.date=2023-10-01&rft.volume=14&rft.issue=10&rft.spage=3393&rft.epage=3411&rft.pages=3393-3411&rft.issn=1868-8071&rft.eissn=1868-808X&rft_id=info:doi/10.1007/s13042-023-01839-0&rft_dat=%3Cproquest_cross%3E2919481984%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2919481984&rft_id=info:pmid/&rfr_iscdi=true |