Entropy-based Sampling Approaches for Multi-Class Imbalanced Problems
In data mining, large differences between multi-class distributions regarded as class imbalance issues have been known to hinder the classification performance. Unfortunately, existing sampling methods have shown their deficiencies such as causing the problems of over-generation and over-lapping by...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on knowledge and data engineering 2020-11, Vol.32 (11), p.2159-2170 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2170 |
---|---|
container_issue | 11 |
container_start_page | 2159 |
container_title | IEEE transactions on knowledge and data engineering |
container_volume | 32 |
creator | Li, Lusi He, Haibo Li, Jie |
description | In data mining, large differences between multi-class distributions regarded as class imbalance issues have been known to hinder the classification performance. Unfortunately, existing sampling methods have shown their deficiencies such as causing the problems of over-generation and over-lapping by oversampling techniques, or the excessive loss of significant information by undersampling techniques. This paper presents three proposed sampling approaches for imbalanced learning: the first one is the entropy-based oversampling (EOS) approach; the second one is the entropy-based undersampling (EUS) approach; the third one is the entropy-based hybrid sampling (EHS) approach combined by both oversampling and undersampling approaches. These three approaches are based on a new class imbalance metric, termed entropy-based imbalance degree (EID), considering the differences of information contents between classes instead of traditional imbalance-ratio. Specifically, to balance a data set after evaluating the information influence degree of each instance, EOS generates new instances around difficult-to-learn instances and only remains the informative ones. EUS removes easy-to-learn instances. While EHS can do both simultaneously. Finally, we use all the generated and remaining instances to train several classifiers. Extensive experiments over synthetic and real-world data sets demonstrate the effectiveness of our approaches. |
doi_str_mv | 10.1109/TKDE.2019.2913859 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2449308543</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8703114</ieee_id><sourcerecordid>2449308543</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-ede7c5220b6802c7ad749126f91bce1d7b2fd7a4183efe5a76c0d74b6e76ff323</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRS0EEqXwAYhNJNYpHtuJ7WVVClQUgURZW44zhlR5YaeL_j2pWrGaWZx7Z3QIuQU6A6D6YfP6uJwxCnrGNHCV6TMygSxTKQMN5-NOBaSCC3lJrmLcUkqVVDAhy2U7hK7fp4WNWCaftunrqv1O5n0fOut-MCa-C8nbrh6qdFHbGJNVU9jatm7EP0JX1NjEa3LhbR3x5jSn5OtpuVm8pOv359Vivk4d5_mQYonSZYzRIleUOWlLKTSw3GsoHEIpC-ZLaQUojh4zK3NHR6TIUebec8an5P7YOz73u8M4mG23C-140jAhNKcqE3yk4Ei50MUY0Js-VI0NewPUHGyZgy1zsGVOtsbM3TFTIeI_ryTlAIL_ARHNZco</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2449308543</pqid></control><display><type>article</type><title>Entropy-based Sampling Approaches for Multi-Class Imbalanced Problems</title><source>IEEE Explore</source><creator>Li, Lusi ; He, Haibo ; Li, Jie</creator><creatorcontrib>Li, Lusi ; He, Haibo ; Li, Jie</creatorcontrib><description>In data mining, large differences between multi-class distributions regarded as class imbalance issues have been known to hinder the classification performance. Unfortunately, existing sampling methods have shown their deficiencies such as causing the problems of over-generation and over-lapping by oversampling techniques, or the excessive loss of significant information by undersampling techniques. This paper presents three proposed sampling approaches for imbalanced learning: the first one is the entropy-based oversampling (EOS) approach; the second one is the entropy-based undersampling (EUS) approach; the third one is the entropy-based hybrid sampling (EHS) approach combined by both oversampling and undersampling approaches. These three approaches are based on a new class imbalance metric, termed entropy-based imbalance degree (EID), considering the differences of information contents between classes instead of traditional imbalance-ratio. Specifically, to balance a data set after evaluating the information influence degree of each instance, EOS generates new instances around difficult-to-learn instances and only remains the informative ones. EUS removes easy-to-learn instances. While EHS can do both simultaneously. Finally, we use all the generated and remaining instances to train several classifiers. Extensive experiments over synthetic and real-world data sets demonstrate the effectiveness of our approaches.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2019.2913859</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Data mining ; Datasets ; Earth Observing System ; Entropy ; hybrid sampling ; Imbalanced learning ; Information entropy ; Measurement uncertainty ; Oversampling ; Sampling methods ; Uncertainty ; undersampling</subject><ispartof>IEEE transactions on knowledge and data engineering, 2020-11, Vol.32 (11), p.2159-2170</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-ede7c5220b6802c7ad749126f91bce1d7b2fd7a4183efe5a76c0d74b6e76ff323</citedby><cites>FETCH-LOGICAL-c336t-ede7c5220b6802c7ad749126f91bce1d7b2fd7a4183efe5a76c0d74b6e76ff323</cites><orcidid>0000-0002-4323-2632 ; 0000-0002-3103-4452 ; 0000-0002-7075-4145</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8703114$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8703114$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Li, Lusi</creatorcontrib><creatorcontrib>He, Haibo</creatorcontrib><creatorcontrib>Li, Jie</creatorcontrib><title>Entropy-based Sampling Approaches for Multi-Class Imbalanced Problems</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>In data mining, large differences between multi-class distributions regarded as class imbalance issues have been known to hinder the classification performance. Unfortunately, existing sampling methods have shown their deficiencies such as causing the problems of over-generation and over-lapping by oversampling techniques, or the excessive loss of significant information by undersampling techniques. This paper presents three proposed sampling approaches for imbalanced learning: the first one is the entropy-based oversampling (EOS) approach; the second one is the entropy-based undersampling (EUS) approach; the third one is the entropy-based hybrid sampling (EHS) approach combined by both oversampling and undersampling approaches. These three approaches are based on a new class imbalance metric, termed entropy-based imbalance degree (EID), considering the differences of information contents between classes instead of traditional imbalance-ratio. Specifically, to balance a data set after evaluating the information influence degree of each instance, EOS generates new instances around difficult-to-learn instances and only remains the informative ones. EUS removes easy-to-learn instances. While EHS can do both simultaneously. Finally, we use all the generated and remaining instances to train several classifiers. Extensive experiments over synthetic and real-world data sets demonstrate the effectiveness of our approaches.</description><subject>Data mining</subject><subject>Datasets</subject><subject>Earth Observing System</subject><subject>Entropy</subject><subject>hybrid sampling</subject><subject>Imbalanced learning</subject><subject>Information entropy</subject><subject>Measurement uncertainty</subject><subject>Oversampling</subject><subject>Sampling methods</subject><subject>Uncertainty</subject><subject>undersampling</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRS0EEqXwAYhNJNYpHtuJ7WVVClQUgURZW44zhlR5YaeL_j2pWrGaWZx7Z3QIuQU6A6D6YfP6uJwxCnrGNHCV6TMygSxTKQMN5-NOBaSCC3lJrmLcUkqVVDAhy2U7hK7fp4WNWCaftunrqv1O5n0fOut-MCa-C8nbrh6qdFHbGJNVU9jatm7EP0JX1NjEa3LhbR3x5jSn5OtpuVm8pOv359Vivk4d5_mQYonSZYzRIleUOWlLKTSw3GsoHEIpC-ZLaQUojh4zK3NHR6TIUebec8an5P7YOz73u8M4mG23C-140jAhNKcqE3yk4Ei50MUY0Js-VI0NewPUHGyZgy1zsGVOtsbM3TFTIeI_ryTlAIL_ARHNZco</recordid><startdate>20201101</startdate><enddate>20201101</enddate><creator>Li, Lusi</creator><creator>He, Haibo</creator><creator>Li, Jie</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-4323-2632</orcidid><orcidid>https://orcid.org/0000-0002-3103-4452</orcidid><orcidid>https://orcid.org/0000-0002-7075-4145</orcidid></search><sort><creationdate>20201101</creationdate><title>Entropy-based Sampling Approaches for Multi-Class Imbalanced Problems</title><author>Li, Lusi ; He, Haibo ; Li, Jie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-ede7c5220b6802c7ad749126f91bce1d7b2fd7a4183efe5a76c0d74b6e76ff323</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Data mining</topic><topic>Datasets</topic><topic>Earth Observing System</topic><topic>Entropy</topic><topic>hybrid sampling</topic><topic>Imbalanced learning</topic><topic>Information entropy</topic><topic>Measurement uncertainty</topic><topic>Oversampling</topic><topic>Sampling methods</topic><topic>Uncertainty</topic><topic>undersampling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Lusi</creatorcontrib><creatorcontrib>He, Haibo</creatorcontrib><creatorcontrib>Li, Jie</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Explore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Lusi</au><au>He, Haibo</au><au>Li, Jie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Entropy-based Sampling Approaches for Multi-Class Imbalanced Problems</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2020-11-01</date><risdate>2020</risdate><volume>32</volume><issue>11</issue><spage>2159</spage><epage>2170</epage><pages>2159-2170</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>In data mining, large differences between multi-class distributions regarded as class imbalance issues have been known to hinder the classification performance. Unfortunately, existing sampling methods have shown their deficiencies such as causing the problems of over-generation and over-lapping by oversampling techniques, or the excessive loss of significant information by undersampling techniques. This paper presents three proposed sampling approaches for imbalanced learning: the first one is the entropy-based oversampling (EOS) approach; the second one is the entropy-based undersampling (EUS) approach; the third one is the entropy-based hybrid sampling (EHS) approach combined by both oversampling and undersampling approaches. These three approaches are based on a new class imbalance metric, termed entropy-based imbalance degree (EID), considering the differences of information contents between classes instead of traditional imbalance-ratio. Specifically, to balance a data set after evaluating the information influence degree of each instance, EOS generates new instances around difficult-to-learn instances and only remains the informative ones. EUS removes easy-to-learn instances. While EHS can do both simultaneously. Finally, we use all the generated and remaining instances to train several classifiers. Extensive experiments over synthetic and real-world data sets demonstrate the effectiveness of our approaches.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2019.2913859</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-4323-2632</orcidid><orcidid>https://orcid.org/0000-0002-3103-4452</orcidid><orcidid>https://orcid.org/0000-0002-7075-4145</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1041-4347 |
ispartof | IEEE transactions on knowledge and data engineering, 2020-11, Vol.32 (11), p.2159-2170 |
issn | 1041-4347 1558-2191 |
language | eng |
recordid | cdi_proquest_journals_2449308543 |
source | IEEE Explore |
subjects | Data mining Datasets Earth Observing System Entropy hybrid sampling Imbalanced learning Information entropy Measurement uncertainty Oversampling Sampling methods Uncertainty undersampling |
title | Entropy-based Sampling Approaches for Multi-Class Imbalanced Problems |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T17%3A12%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Entropy-based%20Sampling%20Approaches%20for%20Multi-Class%20Imbalanced%20Problems&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Li,%20Lusi&rft.date=2020-11-01&rft.volume=32&rft.issue=11&rft.spage=2159&rft.epage=2170&rft.pages=2159-2170&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2019.2913859&rft_dat=%3Cproquest_RIE%3E2449308543%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2449308543&rft_id=info:pmid/&rft_ieee_id=8703114&rfr_iscdi=true |