Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification
An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particu...
Gespeichert in:
Veröffentlicht in: | BioData mining 2016-12, Vol.9 (1), p.37, Article 37 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 1 |
container_start_page | 37 |
container_title | BioData mining |
container_volume | 9 |
creator | Li, Jinyan Fong, Simon Sung, Yunsick Cho, Kyungeun Wong, Raymond Wong, Kelvin K L |
description | An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class.
In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE.
Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model. |
doi_str_mv | 10.1186/s13040-016-0117-1 |
format | Article |
fullrecord | <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5131504</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A472527881</galeid><sourcerecordid>A472527881</sourcerecordid><originalsourceid>FETCH-LOGICAL-c561t-be1f27365aa72e604edb6868e408c4927b58d44afb7027c069db4eb1e04578043</originalsourceid><addsrcrecordid>eNptUsuOFCEUrRiNM45-gBtD4spFjVDFqzcmnYmPSSYx8bEmQN3qpq2CFqge-7f8QqnucZxODCHAPefee4BTVS8JviRE8reJtJjiGhNeJhE1eVSdE8HKqZXk8YP9WfUspQ3GvMGsfVqdNWIhMRfyvPq97PQ2ux2gdKvjiOwwpQyxNjpBh7q916OzaJyG7OpgNmCP3L3Pa8gz4nyILu9R2EFMetwOzq9QBrv27ucESA-rGV-PqA8RZW1_HAjGeR33yI1GD9rbuZXOpWVOyPmChhE6Z_VwCBdROiXXl0B2wT-vnvR6SPDibr2ovn94_-3qU33z-eP11fKmtoyTXBsgfSNazrQWDXBMoTNccgkUS0sXjTBMdpTq3gjcCIv5ojMUDAFMmZCYthfVu2Pd7WSKHAs-Rz2obXRj0a6CduoU8W6tVmGnGGkJOxR4fVcghvIWKatNmKIvmhWRlLWcCrn4x1rpAZTzfSjF7OiSVUsqGtYIKUlhXf6HVUYH5YOCh96V-EnCm5OEwsnwK6_0lJK6_vrllEuOXBtDShH6-0sSrGanqaPTVHGamp2m5pxXD1_nPuOvtdo_NyrS9g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1845364789</pqid></control><display><type>article</type><title>Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification</title><source>DOAJ Directory of Open Access Journals</source><source>SpringerLink Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central Open Access</source><source>PubMed Central</source><source>Springer Nature OA/Free Journals</source><creator>Li, Jinyan ; Fong, Simon ; Sung, Yunsick ; Cho, Kyungeun ; Wong, Raymond ; Wong, Kelvin K L</creator><creatorcontrib>Li, Jinyan ; Fong, Simon ; Sung, Yunsick ; Cho, Kyungeun ; Wong, Raymond ; Wong, Kelvin K L</creatorcontrib><description>An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class.
In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE.
Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.</description><identifier>ISSN: 1756-0381</identifier><identifier>EISSN: 1756-0381</identifier><identifier>DOI: 10.1186/s13040-016-0117-1</identifier><identifier>PMID: 27980678</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Algorithms ; Biology ; Biomedical engineering ; Data mining ; Mathematical optimization ; Methods ; Technology application</subject><ispartof>BioData mining, 2016-12, Vol.9 (1), p.37, Article 37</ispartof><rights>COPYRIGHT 2016 BioMed Central Ltd.</rights><rights>Copyright BioMed Central 2016</rights><rights>The Author(s). 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c561t-be1f27365aa72e604edb6868e408c4927b58d44afb7027c069db4eb1e04578043</citedby><cites>FETCH-LOGICAL-c561t-be1f27365aa72e604edb6868e408c4927b58d44afb7027c069db4eb1e04578043</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5131504/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5131504/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,861,882,27905,27906,53772,53774</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27980678$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Jinyan</creatorcontrib><creatorcontrib>Fong, Simon</creatorcontrib><creatorcontrib>Sung, Yunsick</creatorcontrib><creatorcontrib>Cho, Kyungeun</creatorcontrib><creatorcontrib>Wong, Raymond</creatorcontrib><creatorcontrib>Wong, Kelvin K L</creatorcontrib><title>Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification</title><title>BioData mining</title><addtitle>BioData Min</addtitle><description>An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class.
In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE.
Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.</description><subject>Algorithms</subject><subject>Biology</subject><subject>Biomedical engineering</subject><subject>Data mining</subject><subject>Mathematical optimization</subject><subject>Methods</subject><subject>Technology application</subject><issn>1756-0381</issn><issn>1756-0381</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNptUsuOFCEUrRiNM45-gBtD4spFjVDFqzcmnYmPSSYx8bEmQN3qpq2CFqge-7f8QqnucZxODCHAPefee4BTVS8JviRE8reJtJjiGhNeJhE1eVSdE8HKqZXk8YP9WfUspQ3GvMGsfVqdNWIhMRfyvPq97PQ2ux2gdKvjiOwwpQyxNjpBh7q916OzaJyG7OpgNmCP3L3Pa8gz4nyILu9R2EFMetwOzq9QBrv27ucESA-rGV-PqA8RZW1_HAjGeR33yI1GD9rbuZXOpWVOyPmChhE6Z_VwCBdROiXXl0B2wT-vnvR6SPDibr2ovn94_-3qU33z-eP11fKmtoyTXBsgfSNazrQWDXBMoTNccgkUS0sXjTBMdpTq3gjcCIv5ojMUDAFMmZCYthfVu2Pd7WSKHAs-Rz2obXRj0a6CduoU8W6tVmGnGGkJOxR4fVcghvIWKatNmKIvmhWRlLWcCrn4x1rpAZTzfSjF7OiSVUsqGtYIKUlhXf6HVUYH5YOCh96V-EnCm5OEwsnwK6_0lJK6_vrllEuOXBtDShH6-0sSrGanqaPTVHGamp2m5pxXD1_nPuOvtdo_NyrS9g</recordid><startdate>20161201</startdate><enddate>20161201</enddate><creator>Li, Jinyan</creator><creator>Fong, Simon</creator><creator>Sung, Yunsick</creator><creator>Cho, Kyungeun</creator><creator>Wong, Raymond</creator><creator>Wong, Kelvin K L</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>8AL</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>5PM</scope></search><sort><creationdate>20161201</creationdate><title>Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification</title><author>Li, Jinyan ; Fong, Simon ; Sung, Yunsick ; Cho, Kyungeun ; Wong, Raymond ; Wong, Kelvin K L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c561t-be1f27365aa72e604edb6868e408c4927b58d44afb7027c069db4eb1e04578043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Biology</topic><topic>Biomedical engineering</topic><topic>Data mining</topic><topic>Mathematical optimization</topic><topic>Methods</topic><topic>Technology application</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Jinyan</creatorcontrib><creatorcontrib>Fong, Simon</creatorcontrib><creatorcontrib>Sung, Yunsick</creatorcontrib><creatorcontrib>Cho, Kyungeun</creatorcontrib><creatorcontrib>Wong, Raymond</creatorcontrib><creatorcontrib>Wong, Kelvin K L</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Biological Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BioData mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Jinyan</au><au>Fong, Simon</au><au>Sung, Yunsick</au><au>Cho, Kyungeun</au><au>Wong, Raymond</au><au>Wong, Kelvin K L</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification</atitle><jtitle>BioData mining</jtitle><addtitle>BioData Min</addtitle><date>2016-12-01</date><risdate>2016</risdate><volume>9</volume><issue>1</issue><spage>37</spage><pages>37-</pages><artnum>37</artnum><issn>1756-0381</issn><eissn>1756-0381</eissn><abstract>An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class.
In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE.
Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>27980678</pmid><doi>10.1186/s13040-016-0117-1</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1756-0381 |
ispartof | BioData mining, 2016-12, Vol.9 (1), p.37, Article 37 |
issn | 1756-0381 1756-0381 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5131504 |
source | DOAJ Directory of Open Access Journals; SpringerLink Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central Open Access; PubMed Central; Springer Nature OA/Free Journals |
subjects | Algorithms Biology Biomedical engineering Data mining Mathematical optimization Methods Technology application |
title | Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T12%3A17%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Adaptive%20swarm%20cluster-based%20dynamic%20multi-objective%20synthetic%20minority%20oversampling%20technique%20algorithm%20for%20tackling%20binary%20imbalanced%20datasets%20in%20biomedical%20data%20classification&rft.jtitle=BioData%20mining&rft.au=Li,%20Jinyan&rft.date=2016-12-01&rft.volume=9&rft.issue=1&rft.spage=37&rft.pages=37-&rft.artnum=37&rft.issn=1756-0381&rft.eissn=1756-0381&rft_id=info:doi/10.1186/s13040-016-0117-1&rft_dat=%3Cgale_pubme%3EA472527881%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1845364789&rft_id=info:pmid/27980678&rft_galeid=A472527881&rfr_iscdi=true |