Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification

An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	BioData mining 2016-12, Vol.9 (1), p.37, Article 37
Hauptverfasser:	Li, Jinyan, Fong, Simon, Sung, Yunsick, Cho, Kyungeun, Wong, Raymond, Wong, Kelvin K L
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Biology Biomedical engineering Data mining Mathematical optimization Methods Technology application
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	1
container_start_page	37
container_title	BioData mining
container_volume	9
creator	Li, Jinyan Fong, Simon Sung, Yunsick Cho, Kyungeun Wong, Raymond Wong, Kelvin K L
description	An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.
doi_str_mv	10.1186/s13040-016-0117-1
format	Article
fullrecord	<record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5131504</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A472527881</galeid><sourcerecordid>A472527881</sourcerecordid><originalsourceid>FETCH-LOGICAL-c561t-be1f27365aa72e604edb6868e408c4927b58d44afb7027c069db4eb1e04578043</originalsourceid><addsrcrecordid>eNptUsuOFCEUrRiNM45-gBtD4spFjVDFqzcmnYmPSSYx8bEmQN3qpq2CFqge-7f8QqnucZxODCHAPefee4BTVS8JviRE8reJtJjiGhNeJhE1eVSdE8HKqZXk8YP9WfUspQ3GvMGsfVqdNWIhMRfyvPq97PQ2ux2gdKvjiOwwpQyxNjpBh7q916OzaJyG7OpgNmCP3L3Pa8gz4nyILu9R2EFMetwOzq9QBrv27ucESA-rGV-PqA8RZW1_HAjGeR33yI1GD9rbuZXOpWVOyPmChhE6Z_VwCBdROiXXl0B2wT-vnvR6SPDibr2ovn94_-3qU33z-eP11fKmtoyTXBsgfSNazrQWDXBMoTNccgkUS0sXjTBMdpTq3gjcCIv5ojMUDAFMmZCYthfVu2Pd7WSKHAs-Rz2obXRj0a6CduoU8W6tVmGnGGkJOxR4fVcghvIWKatNmKIvmhWRlLWcCrn4x1rpAZTzfSjF7OiSVUsqGtYIKUlhXf6HVUYH5YOCh96V-EnCm5OEwsnwK6_0lJK6_vrllEuOXBtDShH6-0sSrGanqaPTVHGamp2m5pxXD1_nPuOvtdo_NyrS9g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1845364789</pqid></control><display><type>article</type><title>Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification</title><source>DOAJ Directory of Open Access Journals</source><source>SpringerLink Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central Open Access</source><source>PubMed Central</source><source>Springer Nature OA/Free Journals</source><creator>Li, Jinyan ; Fong, Simon ; Sung, Yunsick ; Cho, Kyungeun ; Wong, Raymond ; Wong, Kelvin K L</creator><creatorcontrib>Li, Jinyan ; Fong, Simon ; Sung, Yunsick ; Cho, Kyungeun ; Wong, Raymond ; Wong, Kelvin K L</creatorcontrib><description>An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.</description><identifier>ISSN: 1756-0381</identifier><identifier>EISSN: 1756-0381</identifier><identifier>DOI: 10.1186/s13040-016-0117-1</identifier><identifier>PMID: 27980678</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Algorithms ; Biology ; Biomedical engineering ; Data mining ; Mathematical optimization ; Methods ; Technology application</subject><ispartof>BioData mining, 2016-12, Vol.9 (1), p.37, Article 37</ispartof><rights>COPYRIGHT 2016 BioMed Central Ltd.</rights><rights>Copyright BioMed Central 2016</rights><rights>The Author(s). 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c561t-be1f27365aa72e604edb6868e408c4927b58d44afb7027c069db4eb1e04578043</citedby><cites>FETCH-LOGICAL-c561t-be1f27365aa72e604edb6868e408c4927b58d44afb7027c069db4eb1e04578043</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5131504/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5131504/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,861,882,27905,27906,53772,53774</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27980678$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Jinyan</creatorcontrib><creatorcontrib>Fong, Simon</creatorcontrib><creatorcontrib>Sung, Yunsick</creatorcontrib><creatorcontrib>Cho, Kyungeun</creatorcontrib><creatorcontrib>Wong, Raymond</creatorcontrib><creatorcontrib>Wong, Kelvin K L</creatorcontrib><title>Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification</title><title>BioData mining</title><addtitle>BioData Min</addtitle><description>An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.</description><subject>Algorithms</subject><subject>Biology</subject><subject>Biomedical engineering</subject><subject>Data mining</subject><subject>Mathematical optimization</subject><subject>Methods</subject><subject>Technology application</subject><issn>1756-0381</issn><issn>1756-0381</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNptUsuOFCEUrRiNM45-gBtD4spFjVDFqzcmnYmPSSYx8bEmQN3qpq2CFqge-7f8QqnucZxODCHAPefee4BTVS8JviRE8reJtJjiGhNeJhE1eVSdE8HKqZXk8YP9WfUspQ3GvMGsfVqdNWIhMRfyvPq97PQ2ux2gdKvjiOwwpQyxNjpBh7q916OzaJyG7OpgNmCP3L3Pa8gz4nyILu9R2EFMetwOzq9QBrv27ucESA-rGV-PqA8RZW1_HAjGeR33yI1GD9rbuZXOpWVOyPmChhE6Z_VwCBdROiXXl0B2wT-vnvR6SPDibr2ovn94_-3qU33z-eP11fKmtoyTXBsgfSNazrQWDXBMoTNccgkUS0sXjTBMdpTq3gjcCIv5ojMUDAFMmZCYthfVu2Pd7WSKHAs-Rz2obXRj0a6CduoU8W6tVmGnGGkJOxR4fVcghvIWKatNmKIvmhWRlLWcCrn4x1rpAZTzfSjF7OiSVUsqGtYIKUlhXf6HVUYH5YOCh96V-EnCm5OEwsnwK6_0lJK6_vrllEuOXBtDShH6-0sSrGanqaPTVHGamp2m5pxXD1_nPuOvtdo_NyrS9g</recordid><startdate>20161201</startdate><enddate>20161201</enddate><creator>Li, Jinyan</creator><creator>Fong, Simon</creator><creator>Sung, Yunsick</creator><creator>Cho, Kyungeun</creator><creator>Wong, Raymond</creator><creator>Wong, Kelvin K L</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>8AL</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>5PM</scope></search><sort><creationdate>20161201</creationdate><title>Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification</title><author>Li, Jinyan ; Fong, Simon ; Sung, Yunsick ; Cho, Kyungeun ; Wong, Raymond ; Wong, Kelvin K L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c561t-be1f27365aa72e604edb6868e408c4927b58d44afb7027c069db4eb1e04578043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Biology</topic><topic>Biomedical engineering</topic><topic>Data mining</topic><topic>Mathematical optimization</topic><topic>Methods</topic><topic>Technology application</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Jinyan</creatorcontrib><creatorcontrib>Fong, Simon</creatorcontrib><creatorcontrib>Sung, Yunsick</creatorcontrib><creatorcontrib>Cho, Kyungeun</creatorcontrib><creatorcontrib>Wong, Raymond</creatorcontrib><creatorcontrib>Wong, Kelvin K L</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Biological Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BioData mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Jinyan</au><au>Fong, Simon</au><au>Sung, Yunsick</au><au>Cho, Kyungeun</au><au>Wong, Raymond</au><au>Wong, Kelvin K L</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification</atitle><jtitle>BioData mining</jtitle><addtitle>BioData Min</addtitle><date>2016-12-01</date><risdate>2016</risdate><volume>9</volume><issue>1</issue><spage>37</spage><pages>37-</pages><artnum>37</artnum><issn>1756-0381</issn><eissn>1756-0381</eissn><abstract>An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>27980678</pmid><doi>10.1186/s13040-016-0117-1</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1756-0381
ispartof	BioData mining, 2016-12, Vol.9 (1), p.37, Article 37
issn	1756-0381 1756-0381
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5131504
source	DOAJ Directory of Open Access Journals; SpringerLink Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central Open Access; PubMed Central; Springer Nature OA/Free Journals
subjects	Algorithms Biology Biomedical engineering Data mining Mathematical optimization Methods Technology application
title	Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T12%3A17%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Adaptive%20swarm%20cluster-based%20dynamic%20multi-objective%20synthetic%20minority%20oversampling%20technique%20algorithm%20for%20tackling%20binary%20imbalanced%20datasets%20in%20biomedical%20data%20classification&rft.jtitle=BioData%20mining&rft.au=Li,%20Jinyan&rft.date=2016-12-01&rft.volume=9&rft.issue=1&rft.spage=37&rft.pages=37-&rft.artnum=37&rft.issn=1756-0381&rft.eissn=1756-0381&rft_id=info:doi/10.1186/s13040-016-0117-1&rft_dat=%3Cgale_pubme%3EA472527881%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1845364789&rft_id=info:pmid/27980678&rft_galeid=A472527881&rfr_iscdi=true