SMOTE_EASY: UM ALGORITMO PARA TRATAR O PROBLEMA DE CLASSIFICACAO EM BASES DE DADOS REAIS/SOMOTE_EASY: AN ALGORITHM TO TREAT THE CLASSIFICATION ISSUE IN REAL DATABASES

Most classification tools assume that data distribution be balanced or with similar costs, when not properly classified. Nevertheless, in practical terms, the existence of database where unbalanced classes occur is commonplace, such as in the diagnosis of diseases, in which the confirmed cases are u...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Revista de gestão da tecnologia e sistemas de informação 2016-01, Vol.13 (1), p.61-61
Hauptverfasser: Rufino, Hugo Leonardo Pereira, Veiga, Antonio Claudio Paschoarelli, Nakamoto, Paula Teixeira
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 61
container_issue 1
container_start_page 61
container_title Revista de gestão da tecnologia e sistemas de informação
container_volume 13
creator Rufino, Hugo Leonardo Pereira
Veiga, Antonio Claudio Paschoarelli
Nakamoto, Paula Teixeira
description Most classification tools assume that data distribution be balanced or with similar costs, when not properly classified. Nevertheless, in practical terms, the existence of database where unbalanced classes occur is commonplace, such as in the diagnosis of diseases, in which the confirmed cases are usually rare when compared with a healthy population. Other examples are the detection of fraudulent calls and the detection of system intruders. In these cases, the improper classification of a minority class (for instance, to diagnose a person with cancer as healthy) may result in more serious consequences that incorrectly classify a majority class. Therefore, it is important to treat the database where unbalanced classes occur. This paper presents the SMOTE_Easy algorithm, which can classify data, even if there is a high level of unbalancing between different classes. In order to prove its efficiency, a comparison with the main algorithms to treat classification issues was made, where unbalanced data exist. This process was successful in nearly all tested databases
doi_str_mv 10.4301/S1807-17752016000100004
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_miscellaneous_1816075577</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1816075577</sourcerecordid><originalsourceid>FETCH-proquest_miscellaneous_18160755773</originalsourceid><addsrcrecordid>eNqV0MFOwzAMBuAIgcQEewZ85FKWbO3ScvPajEZqGhRnB07VQEWaVBgQ9ko8J9kEaFcOlm391ncwY1eC36QzLiYkci4TIWU25WLOORexeHrCRn_B6WEukuk85edsHMLmkadFnnEpihH7ImO96hTSwy2sDGBzZ532xsI9OgTv0KODuDm7aJRBqBSUDRLppS6xRAvKwAJJ0T6psLIETqGmCdkjGdtfuTbgbXQVevD1Mea1bUETrRTodo800fN4wC_Z2fN6CP34p1-w66XyZZ28fWzfd3347F424akfhvVrv92FTuTxGzLLpJz94_Qbm6halg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1816075577</pqid></control><display><type>article</type><title>SMOTE_EASY: UM ALGORITMO PARA TRATAR O PROBLEMA DE CLASSIFICACAO EM BASES DE DADOS REAIS/SOMOTE_EASY: AN ALGORITHM TO TREAT THE CLASSIFICATION ISSUE IN REAL DATABASES</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>Rufino, Hugo Leonardo Pereira ; Veiga, Antonio Claudio Paschoarelli ; Nakamoto, Paula Teixeira</creator><creatorcontrib>Rufino, Hugo Leonardo Pereira ; Veiga, Antonio Claudio Paschoarelli ; Nakamoto, Paula Teixeira</creatorcontrib><description>Most classification tools assume that data distribution be balanced or with similar costs, when not properly classified. Nevertheless, in practical terms, the existence of database where unbalanced classes occur is commonplace, such as in the diagnosis of diseases, in which the confirmed cases are usually rare when compared with a healthy population. Other examples are the detection of fraudulent calls and the detection of system intruders. In these cases, the improper classification of a minority class (for instance, to diagnose a person with cancer as healthy) may result in more serious consequences that incorrectly classify a majority class. Therefore, it is important to treat the database where unbalanced classes occur. This paper presents the SMOTE_Easy algorithm, which can classify data, even if there is a high level of unbalancing between different classes. In order to prove its efficiency, a comparison with the main algorithms to treat classification issues was made, where unbalanced data exist. This process was successful in nearly all tested databases</description><identifier>ISSN: 1809-2640</identifier><identifier>EISSN: 1807-1775</identifier><identifier>DOI: 10.4301/S1807-17752016000100004</identifier><language>eng</language><subject>Algorithms ; Cancer ; Classification ; Diagnosis ; Information systems ; Level (quantity) ; Management</subject><ispartof>Revista de gestão da tecnologia e sistemas de informação, 2016-01, Vol.13 (1), p.61-61</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Rufino, Hugo Leonardo Pereira</creatorcontrib><creatorcontrib>Veiga, Antonio Claudio Paschoarelli</creatorcontrib><creatorcontrib>Nakamoto, Paula Teixeira</creatorcontrib><title>SMOTE_EASY: UM ALGORITMO PARA TRATAR O PROBLEMA DE CLASSIFICACAO EM BASES DE DADOS REAIS/SOMOTE_EASY: AN ALGORITHM TO TREAT THE CLASSIFICATION ISSUE IN REAL DATABASES</title><title>Revista de gestão da tecnologia e sistemas de informação</title><description>Most classification tools assume that data distribution be balanced or with similar costs, when not properly classified. Nevertheless, in practical terms, the existence of database where unbalanced classes occur is commonplace, such as in the diagnosis of diseases, in which the confirmed cases are usually rare when compared with a healthy population. Other examples are the detection of fraudulent calls and the detection of system intruders. In these cases, the improper classification of a minority class (for instance, to diagnose a person with cancer as healthy) may result in more serious consequences that incorrectly classify a majority class. Therefore, it is important to treat the database where unbalanced classes occur. This paper presents the SMOTE_Easy algorithm, which can classify data, even if there is a high level of unbalancing between different classes. In order to prove its efficiency, a comparison with the main algorithms to treat classification issues was made, where unbalanced data exist. This process was successful in nearly all tested databases</description><subject>Algorithms</subject><subject>Cancer</subject><subject>Classification</subject><subject>Diagnosis</subject><subject>Information systems</subject><subject>Level (quantity)</subject><subject>Management</subject><issn>1809-2640</issn><issn>1807-1775</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNqV0MFOwzAMBuAIgcQEewZ85FKWbO3ScvPajEZqGhRnB07VQEWaVBgQ9ko8J9kEaFcOlm391ncwY1eC36QzLiYkci4TIWU25WLOORexeHrCRn_B6WEukuk85edsHMLmkadFnnEpihH7ImO96hTSwy2sDGBzZ532xsI9OgTv0KODuDm7aJRBqBSUDRLppS6xRAvKwAJJ0T6psLIETqGmCdkjGdtfuTbgbXQVevD1Mea1bUETrRTodo800fN4wC_Z2fN6CP34p1-w66XyZZ28fWzfd3347F424akfhvVrv92FTuTxGzLLpJz94_Qbm6halg</recordid><startdate>20160101</startdate><enddate>20160101</enddate><creator>Rufino, Hugo Leonardo Pereira</creator><creator>Veiga, Antonio Claudio Paschoarelli</creator><creator>Nakamoto, Paula Teixeira</creator><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20160101</creationdate><title>SMOTE_EASY: UM ALGORITMO PARA TRATAR O PROBLEMA DE CLASSIFICACAO EM BASES DE DADOS REAIS/SOMOTE_EASY: AN ALGORITHM TO TREAT THE CLASSIFICATION ISSUE IN REAL DATABASES</title><author>Rufino, Hugo Leonardo Pereira ; Veiga, Antonio Claudio Paschoarelli ; Nakamoto, Paula Teixeira</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_miscellaneous_18160755773</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Cancer</topic><topic>Classification</topic><topic>Diagnosis</topic><topic>Information systems</topic><topic>Level (quantity)</topic><topic>Management</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rufino, Hugo Leonardo Pereira</creatorcontrib><creatorcontrib>Veiga, Antonio Claudio Paschoarelli</creatorcontrib><creatorcontrib>Nakamoto, Paula Teixeira</creatorcontrib><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Revista de gestão da tecnologia e sistemas de informação</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rufino, Hugo Leonardo Pereira</au><au>Veiga, Antonio Claudio Paschoarelli</au><au>Nakamoto, Paula Teixeira</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SMOTE_EASY: UM ALGORITMO PARA TRATAR O PROBLEMA DE CLASSIFICACAO EM BASES DE DADOS REAIS/SOMOTE_EASY: AN ALGORITHM TO TREAT THE CLASSIFICATION ISSUE IN REAL DATABASES</atitle><jtitle>Revista de gestão da tecnologia e sistemas de informação</jtitle><date>2016-01-01</date><risdate>2016</risdate><volume>13</volume><issue>1</issue><spage>61</spage><epage>61</epage><pages>61-61</pages><issn>1809-2640</issn><eissn>1807-1775</eissn><abstract>Most classification tools assume that data distribution be balanced or with similar costs, when not properly classified. Nevertheless, in practical terms, the existence of database where unbalanced classes occur is commonplace, such as in the diagnosis of diseases, in which the confirmed cases are usually rare when compared with a healthy population. Other examples are the detection of fraudulent calls and the detection of system intruders. In these cases, the improper classification of a minority class (for instance, to diagnose a person with cancer as healthy) may result in more serious consequences that incorrectly classify a majority class. Therefore, it is important to treat the database where unbalanced classes occur. This paper presents the SMOTE_Easy algorithm, which can classify data, even if there is a high level of unbalancing between different classes. In order to prove its efficiency, a comparison with the main algorithms to treat classification issues was made, where unbalanced data exist. This process was successful in nearly all tested databases</abstract><doi>10.4301/S1807-17752016000100004</doi></addata></record>
fulltext fulltext
identifier ISSN: 1809-2640
ispartof Revista de gestão da tecnologia e sistemas de informação, 2016-01, Vol.13 (1), p.61-61
issn 1809-2640
1807-1775
language eng
recordid cdi_proquest_miscellaneous_1816075577
source EZB-FREE-00999 freely available EZB journals
subjects Algorithms
Cancer
Classification
Diagnosis
Information systems
Level (quantity)
Management
title SMOTE_EASY: UM ALGORITMO PARA TRATAR O PROBLEMA DE CLASSIFICACAO EM BASES DE DADOS REAIS/SOMOTE_EASY: AN ALGORITHM TO TREAT THE CLASSIFICATION ISSUE IN REAL DATABASES
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T03%3A57%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SMOTE_EASY:%20UM%20ALGORITMO%20PARA%20TRATAR%20O%20PROBLEMA%20DE%20CLASSIFICACAO%20EM%20BASES%20DE%20DADOS%20REAIS/SOMOTE_EASY:%20AN%20ALGORITHM%20TO%20TREAT%20THE%20CLASSIFICATION%20ISSUE%20IN%20REAL%20DATABASES&rft.jtitle=Revista%20de%20gest%C3%A3o%20da%20tecnologia%20e%20sistemas%20de%20informa%C3%A7%C3%A3o&rft.au=Rufino,%20Hugo%20Leonardo%20Pereira&rft.date=2016-01-01&rft.volume=13&rft.issue=1&rft.spage=61&rft.epage=61&rft.pages=61-61&rft.issn=1809-2640&rft.eissn=1807-1775&rft_id=info:doi/10.4301/S1807-17752016000100004&rft_dat=%3Cproquest%3E1816075577%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1816075577&rft_id=info:pmid/&rfr_iscdi=true