Extreme re-balancing for SVMs

There are many practical applications where learning from single class examples is either, the only possible solution, or has a distinct performance advantage. The first case occurs when obtaining examples of a second class is difficult, e.g., classifying sites of "interest" based on web a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SIGKDD explorations 2004-06, Vol.6 (1), p.60-69
Hauptverfasser:	Raskutti, Bhavani, Kowalczyk, Adam
Format:	Artikel
Sprache:	eng ; jpn
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	69
container_issue	1
container_start_page	60
container_title	SIGKDD explorations
container_volume	6
creator	Raskutti, Bhavani Kowalczyk, Adam
description	There are many practical applications where learning from single class examples is either, the only possible solution, or has a distinct performance advantage. The first case occurs when obtaining examples of a second class is difficult, e.g., classifying sites of "interest" based on web accesses. The second situation is exemplified by the gene knock-out experiments for understanding Aryl Hydrocarbon Receptor signalling pathway that provided the data for the second task of the KDD 2002 Cup, where minority one-class SVMs significantly outperform models learnt using examples from both classes.This paper explores the limits of supervised learning of a two class discrimination from data with heavily unbalanced class proportions. We focus on the case of supervised learning with support vector machines. We consider the impact of both sampling and weighting imbalance compensation techniques and then extend the balancing to extreme situations when one of the classes is ignored completely and the learning is accomplished using examples from a single class.Our investigation with the data for KDD 2002 Cup as well as text benchmarks such as Reuters Newswire shows that there is a consistent pattern of performance differences between one and two-class learning for all SVMs investigated, and these patterns persist even with aggressive dimensionality reduction through automated feature selection. Using insight gained from the above analysis, we generate synthetic data showing similar pattern of performance.
doi_str_mv	10.1145/1007730.1007739
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_miscellaneous_29480572</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>29480572</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1432-85d0c33790d195224eebf5e316c6a94bdcd3689f65c25095b050b2d5e1c4d2bb3</originalsourceid><addsrcrecordid>eNotjkFLAzEQRnOw0Fp79iTsyVt0JpPJbo5SqhUqPVS9lk0yK5VtVzct-PNdtKfHx4OPp9Q1wh2i5XsEKEsaxh_9hZqgJ9QwuLG6zPkTwFToaKJuFj_HXvZS9KJD3daHuDt8FE3XF5v3l3ylRk3dZpmdOVVvj4vX-VKv1k_P84eVjmjJ6IoTRKLSQ0LPxliR0LAQuuhqb0OKiVzlG8fRMHgOwBBMYsFokwmBpur2__er775Pko_b_S5HaYce6U55a7ytgEtDv98ZPJ4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>29480572</pqid></control><display><type>article</type><title>Extreme re-balancing for SVMs</title><source>ACM Digital Library Complete</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Raskutti, Bhavani ; Kowalczyk, Adam</creator><creatorcontrib>Raskutti, Bhavani ; Kowalczyk, Adam</creatorcontrib><description>There are many practical applications where learning from single class examples is either, the only possible solution, or has a distinct performance advantage. The first case occurs when obtaining examples of a second class is difficult, e.g., classifying sites of "interest" based on web accesses. The second situation is exemplified by the gene knock-out experiments for understanding Aryl Hydrocarbon Receptor signalling pathway that provided the data for the second task of the KDD 2002 Cup, where minority one-class SVMs significantly outperform models learnt using examples from both classes.This paper explores the limits of supervised learning of a two class discrimination from data with heavily unbalanced class proportions. We focus on the case of supervised learning with support vector machines. We consider the impact of both sampling and weighting imbalance compensation techniques and then extend the balancing to extreme situations when one of the classes is ignored completely and the learning is accomplished using examples from a single class.Our investigation with the data for KDD 2002 Cup as well as text benchmarks such as Reuters Newswire shows that there is a consistent pattern of performance differences between one and two-class learning for all SVMs investigated, and these patterns persist even with aggressive dimensionality reduction through automated feature selection. Using insight gained from the above analysis, we generate synthetic data showing similar pattern of performance.</description><identifier>ISSN: 1931-0145</identifier><identifier>DOI: 10.1145/1007730.1007739</identifier><language>eng ; jpn</language><ispartof>SIGKDD explorations, 2004-06, Vol.6 (1), p.60-69</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c1432-85d0c33790d195224eebf5e316c6a94bdcd3689f65c25095b050b2d5e1c4d2bb3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Raskutti, Bhavani</creatorcontrib><creatorcontrib>Kowalczyk, Adam</creatorcontrib><title>Extreme re-balancing for SVMs</title><title>SIGKDD explorations</title><description>There are many practical applications where learning from single class examples is either, the only possible solution, or has a distinct performance advantage. The first case occurs when obtaining examples of a second class is difficult, e.g., classifying sites of "interest" based on web accesses. The second situation is exemplified by the gene knock-out experiments for understanding Aryl Hydrocarbon Receptor signalling pathway that provided the data for the second task of the KDD 2002 Cup, where minority one-class SVMs significantly outperform models learnt using examples from both classes.This paper explores the limits of supervised learning of a two class discrimination from data with heavily unbalanced class proportions. We focus on the case of supervised learning with support vector machines. We consider the impact of both sampling and weighting imbalance compensation techniques and then extend the balancing to extreme situations when one of the classes is ignored completely and the learning is accomplished using examples from a single class.Our investigation with the data for KDD 2002 Cup as well as text benchmarks such as Reuters Newswire shows that there is a consistent pattern of performance differences between one and two-class learning for all SVMs investigated, and these patterns persist even with aggressive dimensionality reduction through automated feature selection. Using insight gained from the above analysis, we generate synthetic data showing similar pattern of performance.</description><issn>1931-0145</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><recordid>eNotjkFLAzEQRnOw0Fp79iTsyVt0JpPJbo5SqhUqPVS9lk0yK5VtVzct-PNdtKfHx4OPp9Q1wh2i5XsEKEsaxh_9hZqgJ9QwuLG6zPkTwFToaKJuFj_HXvZS9KJD3daHuDt8FE3XF5v3l3ylRk3dZpmdOVVvj4vX-VKv1k_P84eVjmjJ6IoTRKLSQ0LPxliR0LAQuuhqb0OKiVzlG8fRMHgOwBBMYsFokwmBpur2__er775Pko_b_S5HaYce6U55a7ytgEtDv98ZPJ4</recordid><startdate>20040601</startdate><enddate>20040601</enddate><creator>Raskutti, Bhavani</creator><creator>Kowalczyk, Adam</creator><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20040601</creationdate><title>Extreme re-balancing for SVMs</title><author>Raskutti, Bhavani ; Kowalczyk, Adam</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1432-85d0c33790d195224eebf5e316c6a94bdcd3689f65c25095b050b2d5e1c4d2bb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng ; jpn</language><creationdate>2004</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Raskutti, Bhavani</creatorcontrib><creatorcontrib>Kowalczyk, Adam</creatorcontrib><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>SIGKDD explorations</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Raskutti, Bhavani</au><au>Kowalczyk, Adam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Extreme re-balancing for SVMs</atitle><jtitle>SIGKDD explorations</jtitle><date>2004-06-01</date><risdate>2004</risdate><volume>6</volume><issue>1</issue><spage>60</spage><epage>69</epage><pages>60-69</pages><issn>1931-0145</issn><abstract>There are many practical applications where learning from single class examples is either, the only possible solution, or has a distinct performance advantage. The first case occurs when obtaining examples of a second class is difficult, e.g., classifying sites of "interest" based on web accesses. The second situation is exemplified by the gene knock-out experiments for understanding Aryl Hydrocarbon Receptor signalling pathway that provided the data for the second task of the KDD 2002 Cup, where minority one-class SVMs significantly outperform models learnt using examples from both classes.This paper explores the limits of supervised learning of a two class discrimination from data with heavily unbalanced class proportions. We focus on the case of supervised learning with support vector machines. We consider the impact of both sampling and weighting imbalance compensation techniques and then extend the balancing to extreme situations when one of the classes is ignored completely and the learning is accomplished using examples from a single class.Our investigation with the data for KDD 2002 Cup as well as text benchmarks such as Reuters Newswire shows that there is a consistent pattern of performance differences between one and two-class learning for all SVMs investigated, and these patterns persist even with aggressive dimensionality reduction through automated feature selection. Using insight gained from the above analysis, we generate synthetic data showing similar pattern of performance.</abstract><doi>10.1145/1007730.1007739</doi><tpages>10</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1931-0145
ispartof	SIGKDD explorations, 2004-06, Vol.6 (1), p.60-69
issn	1931-0145
language	eng ; jpn
recordid	cdi_proquest_miscellaneous_29480572
source	ACM Digital Library Complete; EZB-FREE-00999 freely available EZB journals
title	Extreme re-balancing for SVMs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T15%3A14%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Extreme%20re-balancing%20for%20SVMs&rft.jtitle=SIGKDD%20explorations&rft.au=Raskutti,%20Bhavani&rft.date=2004-06-01&rft.volume=6&rft.issue=1&rft.spage=60&rft.epage=69&rft.pages=60-69&rft.issn=1931-0145&rft_id=info:doi/10.1145/1007730.1007739&rft_dat=%3Cproquest%3E29480572%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=29480572&rft_id=info:pmid/&rfr_iscdi=true