Improved Self-Generating Prototypes Algorithm for Imbalanced Datasets

Some real world datasets have different proportions of classes, too many instances of the majority classes and only a few of the minority classes, those are called imbalanced datasets. Many applications, like medical diagnosis and risk analysis, are interested in the under-represented class, but cla...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Oliveira, D. V. R., Magalhaes, G. R., Cavalcanti, G. D. C., Ren, T. I.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 909
container_issue
container_start_page 904
container_title
container_volume 1
creator Oliveira, D. V. R.
Magalhaes, G. R.
Cavalcanti, G. D. C.
Ren, T. I.
description Some real world datasets have different proportions of classes, too many instances of the majority classes and only a few of the minority classes, those are called imbalanced datasets. Many applications, like medical diagnosis and risk analysis, are interested in the under-represented class, but classifiers and prototype generation techniques usually have a bias towards the majority classes. Because of that, the problem of classification with imbalanced datasets has become an important topic in Pattern Recognition. The Self-Generating Prototypes (SGP) have a high reduction power and an excellent performance with balanced datasets, but, with imbalanced datasets, the generated prototypes do not have a good representation of the training dataset. This algorithm generates many prototypes of the majority classes and only a few, or even none, of the minority classes. The aim of this paper is to propose the Adaptive Self-Generating Prototypes (ASGP), an improvement of the SGP2, the second version of the SGP, designed to handle imbalanced datasets. This paper also exposes the reasons for the low performance of the SGP2 with such datasets. Empirical results show that the ASGP has a higher performance with imbalanced datasets than the SGP2, especially when it comes to classification accuracy of the minority classes.
doi_str_mv 10.1109/ICTAI.2012.126
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6495140</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6495140</ieee_id><sourcerecordid>6495140</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-4ff23d047909310da735afe6ee25531734a35d092ddf11326fe8fb138105db2a3</originalsourceid><addsrcrecordid>eNotjstKAzEUQOMLbGu3btzMD0y99yaZTJZDrXWgoGBdl9Tc1JF5lEwQ-vcWdHVW53CEuEdYIIJ9rJfbql4QIC2QigsxBVNYrSxquhQTkkbngNZciSkqYy0QmeJaTBBKyqUCeyvm4_gNAAhSQ6knYlV3xzj8sM_euQ35mnuOLjX9IXuLQxrS6chjVrWHITbpq8vCELO627vW9Z9n58klN3Ia78RNcO3I83_OxMfzart8yTev63pZbfIGjU65CoGkh_MaWIngnZHaBS6YSWuJRiontQdL3gdESUXgMuxRlgja78nJmXj46zbMvDvGpnPxtCuU1ahA_gJZUE5e</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Improved Self-Generating Prototypes Algorithm for Imbalanced Datasets</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Oliveira, D. V. R. ; Magalhaes, G. R. ; Cavalcanti, G. D. C. ; Ren, T. I.</creator><creatorcontrib>Oliveira, D. V. R. ; Magalhaes, G. R. ; Cavalcanti, G. D. C. ; Ren, T. I.</creatorcontrib><description>Some real world datasets have different proportions of classes, too many instances of the majority classes and only a few of the minority classes, those are called imbalanced datasets. Many applications, like medical diagnosis and risk analysis, are interested in the under-represented class, but classifiers and prototype generation techniques usually have a bias towards the majority classes. Because of that, the problem of classification with imbalanced datasets has become an important topic in Pattern Recognition. The Self-Generating Prototypes (SGP) have a high reduction power and an excellent performance with balanced datasets, but, with imbalanced datasets, the generated prototypes do not have a good representation of the training dataset. This algorithm generates many prototypes of the majority classes and only a few, or even none, of the minority classes. The aim of this paper is to propose the Adaptive Self-Generating Prototypes (ASGP), an improvement of the SGP2, the second version of the SGP, designed to handle imbalanced datasets. This paper also exposes the reasons for the low performance of the SGP2 with such datasets. Empirical results show that the ASGP has a higher performance with imbalanced datasets than the SGP2, especially when it comes to classification accuracy of the minority classes.</description><identifier>ISSN: 1082-3409</identifier><identifier>ISBN: 1479902276</identifier><identifier>ISBN: 9781479902279</identifier><identifier>EISSN: 2375-0197</identifier><identifier>EISBN: 0769549152</identifier><identifier>EISBN: 9780769549156</identifier><identifier>DOI: 10.1109/ICTAI.2012.126</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Adaptive Self-Generating Prototypes ASGP ; Algorithm design and analysis ; Classification ; Imbalanced Datasets ; Medical diagnosis ; Noise ; Prediction algorithms ; Prototype Generation (PG) ; Prototypes ; Self-Generating Prototypes (SGP) ; Training</subject><ispartof>2012 IEEE 24th International Conference on Tools with Artificial Intelligence, 2012, Vol.1, p.904-909</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6495140$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2051,27904,54898</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6495140$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Oliveira, D. V. R.</creatorcontrib><creatorcontrib>Magalhaes, G. R.</creatorcontrib><creatorcontrib>Cavalcanti, G. D. C.</creatorcontrib><creatorcontrib>Ren, T. I.</creatorcontrib><title>Improved Self-Generating Prototypes Algorithm for Imbalanced Datasets</title><title>2012 IEEE 24th International Conference on Tools with Artificial Intelligence</title><addtitle>TAI</addtitle><description>Some real world datasets have different proportions of classes, too many instances of the majority classes and only a few of the minority classes, those are called imbalanced datasets. Many applications, like medical diagnosis and risk analysis, are interested in the under-represented class, but classifiers and prototype generation techniques usually have a bias towards the majority classes. Because of that, the problem of classification with imbalanced datasets has become an important topic in Pattern Recognition. The Self-Generating Prototypes (SGP) have a high reduction power and an excellent performance with balanced datasets, but, with imbalanced datasets, the generated prototypes do not have a good representation of the training dataset. This algorithm generates many prototypes of the majority classes and only a few, or even none, of the minority classes. The aim of this paper is to propose the Adaptive Self-Generating Prototypes (ASGP), an improvement of the SGP2, the second version of the SGP, designed to handle imbalanced datasets. This paper also exposes the reasons for the low performance of the SGP2 with such datasets. Empirical results show that the ASGP has a higher performance with imbalanced datasets than the SGP2, especially when it comes to classification accuracy of the minority classes.</description><subject>Accuracy</subject><subject>Adaptive Self-Generating Prototypes ASGP</subject><subject>Algorithm design and analysis</subject><subject>Classification</subject><subject>Imbalanced Datasets</subject><subject>Medical diagnosis</subject><subject>Noise</subject><subject>Prediction algorithms</subject><subject>Prototype Generation (PG)</subject><subject>Prototypes</subject><subject>Self-Generating Prototypes (SGP)</subject><subject>Training</subject><issn>1082-3409</issn><issn>2375-0197</issn><isbn>1479902276</isbn><isbn>9781479902279</isbn><isbn>0769549152</isbn><isbn>9780769549156</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjstKAzEUQOMLbGu3btzMD0y99yaZTJZDrXWgoGBdl9Tc1JF5lEwQ-vcWdHVW53CEuEdYIIJ9rJfbql4QIC2QigsxBVNYrSxquhQTkkbngNZciSkqYy0QmeJaTBBKyqUCeyvm4_gNAAhSQ6knYlV3xzj8sM_euQ35mnuOLjX9IXuLQxrS6chjVrWHITbpq8vCELO627vW9Z9n58klN3Ia78RNcO3I83_OxMfzart8yTev63pZbfIGjU65CoGkh_MaWIngnZHaBS6YSWuJRiontQdL3gdESUXgMuxRlgja78nJmXj46zbMvDvGpnPxtCuU1ahA_gJZUE5e</recordid><startdate>201211</startdate><enddate>201211</enddate><creator>Oliveira, D. V. R.</creator><creator>Magalhaes, G. R.</creator><creator>Cavalcanti, G. D. C.</creator><creator>Ren, T. I.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201211</creationdate><title>Improved Self-Generating Prototypes Algorithm for Imbalanced Datasets</title><author>Oliveira, D. V. R. ; Magalhaes, G. R. ; Cavalcanti, G. D. C. ; Ren, T. I.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-4ff23d047909310da735afe6ee25531734a35d092ddf11326fe8fb138105db2a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Accuracy</topic><topic>Adaptive Self-Generating Prototypes ASGP</topic><topic>Algorithm design and analysis</topic><topic>Classification</topic><topic>Imbalanced Datasets</topic><topic>Medical diagnosis</topic><topic>Noise</topic><topic>Prediction algorithms</topic><topic>Prototype Generation (PG)</topic><topic>Prototypes</topic><topic>Self-Generating Prototypes (SGP)</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Oliveira, D. V. R.</creatorcontrib><creatorcontrib>Magalhaes, G. R.</creatorcontrib><creatorcontrib>Cavalcanti, G. D. C.</creatorcontrib><creatorcontrib>Ren, T. I.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Oliveira, D. V. R.</au><au>Magalhaes, G. R.</au><au>Cavalcanti, G. D. C.</au><au>Ren, T. I.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Improved Self-Generating Prototypes Algorithm for Imbalanced Datasets</atitle><btitle>2012 IEEE 24th International Conference on Tools with Artificial Intelligence</btitle><stitle>TAI</stitle><date>2012-11</date><risdate>2012</risdate><volume>1</volume><spage>904</spage><epage>909</epage><pages>904-909</pages><issn>1082-3409</issn><eissn>2375-0197</eissn><isbn>1479902276</isbn><isbn>9781479902279</isbn><eisbn>0769549152</eisbn><eisbn>9780769549156</eisbn><coden>IEEPAD</coden><abstract>Some real world datasets have different proportions of classes, too many instances of the majority classes and only a few of the minority classes, those are called imbalanced datasets. Many applications, like medical diagnosis and risk analysis, are interested in the under-represented class, but classifiers and prototype generation techniques usually have a bias towards the majority classes. Because of that, the problem of classification with imbalanced datasets has become an important topic in Pattern Recognition. The Self-Generating Prototypes (SGP) have a high reduction power and an excellent performance with balanced datasets, but, with imbalanced datasets, the generated prototypes do not have a good representation of the training dataset. This algorithm generates many prototypes of the majority classes and only a few, or even none, of the minority classes. The aim of this paper is to propose the Adaptive Self-Generating Prototypes (ASGP), an improvement of the SGP2, the second version of the SGP, designed to handle imbalanced datasets. This paper also exposes the reasons for the low performance of the SGP2 with such datasets. Empirical results show that the ASGP has a higher performance with imbalanced datasets than the SGP2, especially when it comes to classification accuracy of the minority classes.</abstract><pub>IEEE</pub><doi>10.1109/ICTAI.2012.126</doi><tpages>6</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1082-3409
ispartof 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, 2012, Vol.1, p.904-909
issn 1082-3409
2375-0197
language eng
recordid cdi_ieee_primary_6495140
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Accuracy
Adaptive Self-Generating Prototypes ASGP
Algorithm design and analysis
Classification
Imbalanced Datasets
Medical diagnosis
Noise
Prediction algorithms
Prototype Generation (PG)
Prototypes
Self-Generating Prototypes (SGP)
Training
title Improved Self-Generating Prototypes Algorithm for Imbalanced Datasets
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T04%3A43%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Improved%20Self-Generating%20Prototypes%20Algorithm%20for%20Imbalanced%20Datasets&rft.btitle=2012%20IEEE%2024th%20International%20Conference%20on%20Tools%20with%20Artificial%20Intelligence&rft.au=Oliveira,%20D.%20V.%20R.&rft.date=2012-11&rft.volume=1&rft.spage=904&rft.epage=909&rft.pages=904-909&rft.issn=1082-3409&rft.eissn=2375-0197&rft.isbn=1479902276&rft.isbn_list=9781479902279&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICTAI.2012.126&rft_dat=%3Cieee_6IE%3E6495140%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=0769549152&rft.eisbn_list=9780769549156&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6495140&rfr_iscdi=true