Dynamic clustering method for imbalanced learning based on AdaBoost

Our paper aims at learning from imbalance data based on ensemble learning. At the stage, the main solution is to combine under-sampling, oversampling or cost sensitivity learning with ensemble learning. However, these feature space-based methods fail to reflect the transformation of distribution and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2020-12, Vol.76 (12), p.9716-9738
Hauptverfasser:	Deng, Xiaoheng, Xu, Yuebin, Chen, Lingchi, Zhong, Weijian, Jolfaei, Alireza, Zheng, Xi
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Clustering Coefficient of variation Compilers Complexity Computer Science Datasets Gene expression Intelligent and Pervasive Computing for Cyber-Physical Systems Interpreters Iterative methods Machine learning Oversampling Processor Architectures Programming Languages Spatial data Spatial distribution Weight
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	9738
container_issue	12
container_start_page	9716
container_title	The Journal of supercomputing
container_volume	76
creator	Deng, Xiaoheng Xu, Yuebin Chen, Lingchi Zhong, Weijian Jolfaei, Alireza Zheng, Xi
description	Our paper aims at learning from imbalance data based on ensemble learning. At the stage, the main solution is to combine under-sampling, oversampling or cost sensitivity learning with ensemble learning. However, these feature space-based methods fail to reflect the transformation of distribution and are usually accompanied with high computational complexity and risk of overfitting. In this paper, we propose a dynamic cluster algorithm based on coefficient of variation (or entropy), which learns the local spatial distribution of data and hierarchically clusters the majority. This algorithm has low complexity and can dynamically adjust the cluster according to the iteration of AdaBoost, adaptively synchronized with changes caused by sample weight changes. Then, we design an index to measure the importance of each cluster. Based on this index, a dynamic sampling algorithm based on maximum weight is proposed. The effectiveness of the sampling algorithm is proved by visual experiments. Finally, we propose a cost-sensitive algorithm based on Bagging, and combine it with the dynamic sampling algorithm to propose a multi-fusion imbalanced ensemble learning algorithm. In experimental research, our algorithms have been validated on three artificial datasets, 22 KEEL datasets and two gene expression cancer datasets, and have shown ideal or better performance than SOTA in terms of AUC, indicating that our algorithms are not only effective imbalance algorithms, but also provide potential for building a reliable biological cyber-physical system.
doi_str_mv	10.1007/s11227-020-03211-3
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2450310657</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2450310657</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-c9d4ae88288d70d5b22a93bdf576c3033d853cc6efafca94b00ae8e4cb13bdee3</originalsourceid><addsrcrecordid>eNp9kD1PwzAQhi0EEqHwB5giMRvOPqdxxlIoIFVigdlybAdSJXGw06H_HpcgsTGdTu_H6R5CrhncMoDyLjLGeUmBAwXkjFE8IRkrSqQgpDglGVRJkoXg5-Qixh0ACCwxI-uHw6D71uSm28fJhXb4yHs3fXqbNz7kbV_rTg_G2bxzOgxHudYxrX7IV1bfex-nS3LW6C66q9-5IO-bx7f1M92-Pr2sV1tqkFUTNZUV2knJpbQl2KLmXFdY26YolwYB0coCjVm6RjdGV6IGSHYnTM2SyzlckJu5dwz-a-_ipHZ-H4Z0UnFRADJYpo8XhM8uE3yMwTVqDG2vw0ExUEdYaoalEiz1A0thCuEciuMRgQt_1f-kvgHYL2zt</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2450310657</pqid></control><display><type>article</type><title>Dynamic clustering method for imbalanced learning based on AdaBoost</title><source>Springer Nature - Complete Springer Journals</source><creator>Deng, Xiaoheng ; Xu, Yuebin ; Chen, Lingchi ; Zhong, Weijian ; Jolfaei, Alireza ; Zheng, Xi</creator><creatorcontrib>Deng, Xiaoheng ; Xu, Yuebin ; Chen, Lingchi ; Zhong, Weijian ; Jolfaei, Alireza ; Zheng, Xi</creatorcontrib><description>Our paper aims at learning from imbalance data based on ensemble learning. At the stage, the main solution is to combine under-sampling, oversampling or cost sensitivity learning with ensemble learning. However, these feature space-based methods fail to reflect the transformation of distribution and are usually accompanied with high computational complexity and risk of overfitting. In this paper, we propose a dynamic cluster algorithm based on coefficient of variation (or entropy), which learns the local spatial distribution of data and hierarchically clusters the majority. This algorithm has low complexity and can dynamically adjust the cluster according to the iteration of AdaBoost, adaptively synchronized with changes caused by sample weight changes. Then, we design an index to measure the importance of each cluster. Based on this index, a dynamic sampling algorithm based on maximum weight is proposed. The effectiveness of the sampling algorithm is proved by visual experiments. Finally, we propose a cost-sensitive algorithm based on Bagging, and combine it with the dynamic sampling algorithm to propose a multi-fusion imbalanced ensemble learning algorithm. In experimental research, our algorithms have been validated on three artificial datasets, 22 KEEL datasets and two gene expression cancer datasets, and have shown ideal or better performance than SOTA in terms of AUC, indicating that our algorithms are not only effective imbalance algorithms, but also provide potential for building a reliable biological cyber-physical system.</description><identifier>ISSN: 0920-8542</identifier><identifier>EISSN: 1573-0484</identifier><identifier>DOI: 10.1007/s11227-020-03211-3</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Clustering ; Coefficient of variation ; Compilers ; Complexity ; Computer Science ; Datasets ; Gene expression ; Intelligent and Pervasive Computing for Cyber-Physical Systems ; Interpreters ; Iterative methods ; Machine learning ; Oversampling ; Processor Architectures ; Programming Languages ; Spatial data ; Spatial distribution ; Weight</subject><ispartof>The Journal of supercomputing, 2020-12, Vol.76 (12), p.9716-9738</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-c9d4ae88288d70d5b22a93bdf576c3033d853cc6efafca94b00ae8e4cb13bdee3</citedby><cites>FETCH-LOGICAL-c319t-c9d4ae88288d70d5b22a93bdf576c3033d853cc6efafca94b00ae8e4cb13bdee3</cites><orcidid>0000-0003-2740-8025</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11227-020-03211-3$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11227-020-03211-3$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Deng, Xiaoheng</creatorcontrib><creatorcontrib>Xu, Yuebin</creatorcontrib><creatorcontrib>Chen, Lingchi</creatorcontrib><creatorcontrib>Zhong, Weijian</creatorcontrib><creatorcontrib>Jolfaei, Alireza</creatorcontrib><creatorcontrib>Zheng, Xi</creatorcontrib><title>Dynamic clustering method for imbalanced learning based on AdaBoost</title><title>The Journal of supercomputing</title><addtitle>J Supercomput</addtitle><description>Our paper aims at learning from imbalance data based on ensemble learning. At the stage, the main solution is to combine under-sampling, oversampling or cost sensitivity learning with ensemble learning. However, these feature space-based methods fail to reflect the transformation of distribution and are usually accompanied with high computational complexity and risk of overfitting. In this paper, we propose a dynamic cluster algorithm based on coefficient of variation (or entropy), which learns the local spatial distribution of data and hierarchically clusters the majority. This algorithm has low complexity and can dynamically adjust the cluster according to the iteration of AdaBoost, adaptively synchronized with changes caused by sample weight changes. Then, we design an index to measure the importance of each cluster. Based on this index, a dynamic sampling algorithm based on maximum weight is proposed. The effectiveness of the sampling algorithm is proved by visual experiments. Finally, we propose a cost-sensitive algorithm based on Bagging, and combine it with the dynamic sampling algorithm to propose a multi-fusion imbalanced ensemble learning algorithm. In experimental research, our algorithms have been validated on three artificial datasets, 22 KEEL datasets and two gene expression cancer datasets, and have shown ideal or better performance than SOTA in terms of AUC, indicating that our algorithms are not only effective imbalance algorithms, but also provide potential for building a reliable biological cyber-physical system.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Coefficient of variation</subject><subject>Compilers</subject><subject>Complexity</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>Gene expression</subject><subject>Intelligent and Pervasive Computing for Cyber-Physical Systems</subject><subject>Interpreters</subject><subject>Iterative methods</subject><subject>Machine learning</subject><subject>Oversampling</subject><subject>Processor Architectures</subject><subject>Programming Languages</subject><subject>Spatial data</subject><subject>Spatial distribution</subject><subject>Weight</subject><issn>0920-8542</issn><issn>1573-0484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9kD1PwzAQhi0EEqHwB5giMRvOPqdxxlIoIFVigdlybAdSJXGw06H_HpcgsTGdTu_H6R5CrhncMoDyLjLGeUmBAwXkjFE8IRkrSqQgpDglGVRJkoXg5-Qixh0ACCwxI-uHw6D71uSm28fJhXb4yHs3fXqbNz7kbV_rTg_G2bxzOgxHudYxrX7IV1bfex-nS3LW6C66q9-5IO-bx7f1M92-Pr2sV1tqkFUTNZUV2knJpbQl2KLmXFdY26YolwYB0coCjVm6RjdGV6IGSHYnTM2SyzlckJu5dwz-a-_ipHZ-H4Z0UnFRADJYpo8XhM8uE3yMwTVqDG2vw0ExUEdYaoalEiz1A0thCuEciuMRgQt_1f-kvgHYL2zt</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Deng, Xiaoheng</creator><creator>Xu, Yuebin</creator><creator>Chen, Lingchi</creator><creator>Zhong, Weijian</creator><creator>Jolfaei, Alireza</creator><creator>Zheng, Xi</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-2740-8025</orcidid></search><sort><creationdate>20201201</creationdate><title>Dynamic clustering method for imbalanced learning based on AdaBoost</title><author>Deng, Xiaoheng ; Xu, Yuebin ; Chen, Lingchi ; Zhong, Weijian ; Jolfaei, Alireza ; Zheng, Xi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-c9d4ae88288d70d5b22a93bdf576c3033d853cc6efafca94b00ae8e4cb13bdee3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Coefficient of variation</topic><topic>Compilers</topic><topic>Complexity</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>Gene expression</topic><topic>Intelligent and Pervasive Computing for Cyber-Physical Systems</topic><topic>Interpreters</topic><topic>Iterative methods</topic><topic>Machine learning</topic><topic>Oversampling</topic><topic>Processor Architectures</topic><topic>Programming Languages</topic><topic>Spatial data</topic><topic>Spatial distribution</topic><topic>Weight</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Deng, Xiaoheng</creatorcontrib><creatorcontrib>Xu, Yuebin</creatorcontrib><creatorcontrib>Chen, Lingchi</creatorcontrib><creatorcontrib>Zhong, Weijian</creatorcontrib><creatorcontrib>Jolfaei, Alireza</creatorcontrib><creatorcontrib>Zheng, Xi</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of supercomputing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Deng, Xiaoheng</au><au>Xu, Yuebin</au><au>Chen, Lingchi</au><au>Zhong, Weijian</au><au>Jolfaei, Alireza</au><au>Zheng, Xi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dynamic clustering method for imbalanced learning based on AdaBoost</atitle><jtitle>The Journal of supercomputing</jtitle><stitle>J Supercomput</stitle><date>2020-12-01</date><risdate>2020</risdate><volume>76</volume><issue>12</issue><spage>9716</spage><epage>9738</epage><pages>9716-9738</pages><issn>0920-8542</issn><eissn>1573-0484</eissn><abstract>Our paper aims at learning from imbalance data based on ensemble learning. At the stage, the main solution is to combine under-sampling, oversampling or cost sensitivity learning with ensemble learning. However, these feature space-based methods fail to reflect the transformation of distribution and are usually accompanied with high computational complexity and risk of overfitting. In this paper, we propose a dynamic cluster algorithm based on coefficient of variation (or entropy), which learns the local spatial distribution of data and hierarchically clusters the majority. This algorithm has low complexity and can dynamically adjust the cluster according to the iteration of AdaBoost, adaptively synchronized with changes caused by sample weight changes. Then, we design an index to measure the importance of each cluster. Based on this index, a dynamic sampling algorithm based on maximum weight is proposed. The effectiveness of the sampling algorithm is proved by visual experiments. Finally, we propose a cost-sensitive algorithm based on Bagging, and combine it with the dynamic sampling algorithm to propose a multi-fusion imbalanced ensemble learning algorithm. In experimental research, our algorithms have been validated on three artificial datasets, 22 KEEL datasets and two gene expression cancer datasets, and have shown ideal or better performance than SOTA in terms of AUC, indicating that our algorithms are not only effective imbalance algorithms, but also provide potential for building a reliable biological cyber-physical system.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11227-020-03211-3</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0003-2740-8025</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0920-8542
ispartof	The Journal of supercomputing, 2020-12, Vol.76 (12), p.9716-9738
issn	0920-8542 1573-0484
language	eng
recordid	cdi_proquest_journals_2450310657
source	Springer Nature - Complete Springer Journals
subjects	Algorithms Clustering Coefficient of variation Compilers Complexity Computer Science Datasets Gene expression Intelligent and Pervasive Computing for Cyber-Physical Systems Interpreters Iterative methods Machine learning Oversampling Processor Architectures Programming Languages Spatial data Spatial distribution Weight
title	Dynamic clustering method for imbalanced learning based on AdaBoost
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-21T15%3A55%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dynamic%20clustering%20method%20for%20imbalanced%20learning%20based%20on%20AdaBoost&rft.jtitle=The%20Journal%20of%20supercomputing&rft.au=Deng,%20Xiaoheng&rft.date=2020-12-01&rft.volume=76&rft.issue=12&rft.spage=9716&rft.epage=9738&rft.pages=9716-9738&rft.issn=0920-8542&rft.eissn=1573-0484&rft_id=info:doi/10.1007/s11227-020-03211-3&rft_dat=%3Cproquest_cross%3E2450310657%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2450310657&rft_id=info:pmid/&rfr_iscdi=true