A new segmented oversampling method for imbalanced data classification using quasi‐linear SVM

Data imbalance occurs on most real‐world classification problems and decreases the performance of classifiers. An oversampling method addresses the imbalance problem by generating synthetic samples to balance the data distribution. However, many of the existing oversampling methods have potential pr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEJ transactions on electrical and electronic engineering 2017-11, Vol.12 (6), p.891-898
Hauptverfasser:	Zhou, Bo, Li, Weite, Hu, Jinglu
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Clustering imbalanced classification kernel composition local linear partition Oversampling oversampling method Partitions support vector machine Support vector machines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	898
container_issue	6
container_start_page	891
container_title	IEEJ transactions on electrical and electronic engineering
container_volume	12
creator	Zhou, Bo Li, Weite Hu, Jinglu
description	Data imbalance occurs on most real‐world classification problems and decreases the performance of classifiers. An oversampling method addresses the imbalance problem by generating synthetic samples to balance the data distribution. However, many of the existing oversampling methods have potential problems in generating wrong and unnecessary synthetic samples, which makes the learning tasks difficult. This paper proposes a new segmented oversampling method for imbalanced data classification. The input space is first partitioned into several linearly separable local partitions along the potential separation boundary by introducing a bottom‐up, minimal‐spanning‐tree‐based clustering method; an oversampling method is then applied within each local linear partition to prevent the generation of wrong and unnecessary synthetic samples; a quasi‐linear support vector machine is finally used to realize the classification by taking advantages of the local linear partitions. Simulation results on different real‐world datasets show that the proposed segmented oversampling method is effective for imbalanced data classifications. © 2017 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
doi_str_mv	10.1002/tee.22480
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1945587224</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1945587224</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3630-3fcd348e67dd247e4b485cf768df005994671cef4130474c38ef8952a4a8ee263</originalsourceid><addsrcrecordid>eNp10LFOwzAQBmALgUQpDLyBJSaGtHbsJM5YVaUgFTFQWC3XORdXSdzaCVU3HoFn5ElICWJjuhu-_076EbqmZEQJiccNwCiOuSAnaEBzRiOeC3r6t2fsHF2EsCGEp0yIAZITXMMeB1hXUDdQYPcOPqhqW9p6jSto3lyBjfPYVitVqlp3pFCNwrpUIVhjtWqsq3Ebjn7XqmC_Pj67MCiPn18fL9GZUWWAq985RC93s-X0Plo8zR-mk0WkWcpIxIwuGBeQZkUR8wz4iotEmywVhSEkyXOeZlSD4ZQRnnHNBBiRJ7HiSgDEKRuim_7u1rtdC6GRG9f6unspac6TRGRdK5267ZX2LgQPRm69rZQ_SErksT_Z9Sd_-uvsuLd7W8LhfyiXs1mf-AYA9XLB</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1945587224</pqid></control><display><type>article</type><title>A new segmented oversampling method for imbalanced data classification using quasi‐linear SVM</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Zhou, Bo ; Li, Weite ; Hu, Jinglu</creator><creatorcontrib>Zhou, Bo ; Li, Weite ; Hu, Jinglu</creatorcontrib><description>Data imbalance occurs on most real‐world classification problems and decreases the performance of classifiers. An oversampling method addresses the imbalance problem by generating synthetic samples to balance the data distribution. However, many of the existing oversampling methods have potential problems in generating wrong and unnecessary synthetic samples, which makes the learning tasks difficult. This paper proposes a new segmented oversampling method for imbalanced data classification. The input space is first partitioned into several linearly separable local partitions along the potential separation boundary by introducing a bottom‐up, minimal‐spanning‐tree‐based clustering method; an oversampling method is then applied within each local linear partition to prevent the generation of wrong and unnecessary synthetic samples; a quasi‐linear support vector machine is finally used to realize the classification by taking advantages of the local linear partitions. Simulation results on different real‐world datasets show that the proposed segmented oversampling method is effective for imbalanced data classifications. © 2017 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.</description><identifier>ISSN: 1931-4973</identifier><identifier>EISSN: 1931-4981</identifier><identifier>DOI: 10.1002/tee.22480</identifier><language>eng</language><publisher>Hoboken, USA: John Wiley & Sons, Inc</publisher><subject>Classification ; Clustering ; imbalanced classification ; kernel composition ; local linear partition ; Oversampling ; oversampling method ; Partitions ; support vector machine ; Support vector machines</subject><ispartof>IEEJ transactions on electrical and electronic engineering, 2017-11, Vol.12 (6), p.891-898</ispartof><rights>2017 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3630-3fcd348e67dd247e4b485cf768df005994671cef4130474c38ef8952a4a8ee263</citedby><cites>FETCH-LOGICAL-c3630-3fcd348e67dd247e4b485cf768df005994671cef4130474c38ef8952a4a8ee263</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Ftee.22480$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Ftee.22480$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,777,781,1413,27906,27907,45556,45557</link.rule.ids></links><search><creatorcontrib>Zhou, Bo</creatorcontrib><creatorcontrib>Li, Weite</creatorcontrib><creatorcontrib>Hu, Jinglu</creatorcontrib><title>A new segmented oversampling method for imbalanced data classification using quasi‐linear SVM</title><title>IEEJ transactions on electrical and electronic engineering</title><description>Data imbalance occurs on most real‐world classification problems and decreases the performance of classifiers. An oversampling method addresses the imbalance problem by generating synthetic samples to balance the data distribution. However, many of the existing oversampling methods have potential problems in generating wrong and unnecessary synthetic samples, which makes the learning tasks difficult. This paper proposes a new segmented oversampling method for imbalanced data classification. The input space is first partitioned into several linearly separable local partitions along the potential separation boundary by introducing a bottom‐up, minimal‐spanning‐tree‐based clustering method; an oversampling method is then applied within each local linear partition to prevent the generation of wrong and unnecessary synthetic samples; a quasi‐linear support vector machine is finally used to realize the classification by taking advantages of the local linear partitions. Simulation results on different real‐world datasets show that the proposed segmented oversampling method is effective for imbalanced data classifications. © 2017 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.</description><subject>Classification</subject><subject>Clustering</subject><subject>imbalanced classification</subject><subject>kernel composition</subject><subject>local linear partition</subject><subject>Oversampling</subject><subject>oversampling method</subject><subject>Partitions</subject><subject>support vector machine</subject><subject>Support vector machines</subject><issn>1931-4973</issn><issn>1931-4981</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNp10LFOwzAQBmALgUQpDLyBJSaGtHbsJM5YVaUgFTFQWC3XORdXSdzaCVU3HoFn5ElICWJjuhu-_076EbqmZEQJiccNwCiOuSAnaEBzRiOeC3r6t2fsHF2EsCGEp0yIAZITXMMeB1hXUDdQYPcOPqhqW9p6jSto3lyBjfPYVitVqlp3pFCNwrpUIVhjtWqsq3Ebjn7XqmC_Pj67MCiPn18fL9GZUWWAq985RC93s-X0Plo8zR-mk0WkWcpIxIwuGBeQZkUR8wz4iotEmywVhSEkyXOeZlSD4ZQRnnHNBBiRJ7HiSgDEKRuim_7u1rtdC6GRG9f6unspac6TRGRdK5267ZX2LgQPRm69rZQ_SErksT_Z9Sd_-uvsuLd7W8LhfyiXs1mf-AYA9XLB</recordid><startdate>201711</startdate><enddate>201711</enddate><creator>Zhou, Bo</creator><creator>Li, Weite</creator><creator>Hu, Jinglu</creator><general>John Wiley & Sons, Inc</general><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope></search><sort><creationdate>201711</creationdate><title>A new segmented oversampling method for imbalanced data classification using quasi‐linear SVM</title><author>Zhou, Bo ; Li, Weite ; Hu, Jinglu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3630-3fcd348e67dd247e4b485cf768df005994671cef4130474c38ef8952a4a8ee263</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Classification</topic><topic>Clustering</topic><topic>imbalanced classification</topic><topic>kernel composition</topic><topic>local linear partition</topic><topic>Oversampling</topic><topic>oversampling method</topic><topic>Partitions</topic><topic>support vector machine</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Bo</creatorcontrib><creatorcontrib>Li, Weite</creatorcontrib><creatorcontrib>Hu, Jinglu</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEJ transactions on electrical and electronic engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhou, Bo</au><au>Li, Weite</au><au>Hu, Jinglu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A new segmented oversampling method for imbalanced data classification using quasi‐linear SVM</atitle><jtitle>IEEJ transactions on electrical and electronic engineering</jtitle><date>2017-11</date><risdate>2017</risdate><volume>12</volume><issue>6</issue><spage>891</spage><epage>898</epage><pages>891-898</pages><issn>1931-4973</issn><eissn>1931-4981</eissn><abstract>Data imbalance occurs on most real‐world classification problems and decreases the performance of classifiers. An oversampling method addresses the imbalance problem by generating synthetic samples to balance the data distribution. However, many of the existing oversampling methods have potential problems in generating wrong and unnecessary synthetic samples, which makes the learning tasks difficult. This paper proposes a new segmented oversampling method for imbalanced data classification. The input space is first partitioned into several linearly separable local partitions along the potential separation boundary by introducing a bottom‐up, minimal‐spanning‐tree‐based clustering method; an oversampling method is then applied within each local linear partition to prevent the generation of wrong and unnecessary synthetic samples; a quasi‐linear support vector machine is finally used to realize the classification by taking advantages of the local linear partitions. Simulation results on different real‐world datasets show that the proposed segmented oversampling method is effective for imbalanced data classifications. © 2017 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.</abstract><cop>Hoboken, USA</cop><pub>John Wiley & Sons, Inc</pub><doi>10.1002/tee.22480</doi><tpages>6</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1931-4973
ispartof	IEEJ transactions on electrical and electronic engineering, 2017-11, Vol.12 (6), p.891-898
issn	1931-4973 1931-4981
language	eng
recordid	cdi_proquest_journals_1945587224
source	Wiley Online Library Journals Frontfile Complete
subjects	Classification Clustering imbalanced classification kernel composition local linear partition Oversampling oversampling method Partitions support vector machine Support vector machines
title	A new segmented oversampling method for imbalanced data classification using quasi‐linear SVM
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T08%3A55%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20new%20segmented%20oversampling%20method%20for%20imbalanced%20data%20classification%20using%20quasi%E2%80%90linear%20SVM&rft.jtitle=IEEJ%20transactions%20on%20electrical%20and%20electronic%20engineering&rft.au=Zhou,%20Bo&rft.date=2017-11&rft.volume=12&rft.issue=6&rft.spage=891&rft.epage=898&rft.pages=891-898&rft.issn=1931-4973&rft.eissn=1931-4981&rft_id=info:doi/10.1002/tee.22480&rft_dat=%3Cproquest_cross%3E1945587224%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1945587224&rft_id=info:pmid/&rfr_iscdi=true