Robust formulations for clustering-based large-scale classification

Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is propo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Optimization and engineering 2013-06, Vol.14 (2), p.225-250
Hauptverfasser:	Jagarlapudi, Saketha Nath, Ben-Tal, Aharon, Bhattacharyya, Chiranjib
Format:	Artikel
Sprache:	eng
Schlagworte:	Chebyshev approximation Classification Classifiers Clusters Confidence Confidence intervals Control Engineering Environmental Management Errors Financial Engineering Formulations Mathematics Mathematics and Statistics Operations Research/Decision Theory Optimization Systems Theory
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	250
container_issue	2
container_start_page	225
container_title	Optimization and engineering
container_volume	14
creator	Jagarlapudi, Saketha Nath Ben-Tal, Aharon Bhattacharyya, Chiranjib
description	Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is proposed. The key idea is to identify high density regions or clusters from individual class conditional densities and then use a CCP formulation to learn a classifier on the clusters. The CCP formulation ensures that most of the data points in a cluster are correctly classified by employing a Chebyshev-inequality-based convex relaxation. This relaxation is heavily dependent on the second-order statistics. However, this formulation and in general such relaxations that depend on the second-order moments are susceptible to moment estimation errors. One of the contributions of the paper is to propose several formulations that are robust to such errors. In particular a generic way of making such formulations robust to moment estimation errors is illustrated using two novel confidence sets. An important contribution is to show that when either of the confidence sets is employed, for the special case of a spherical normal distribution of clusters, the robust variant of the formulation can be posed as a second-order cone program. Empirical results show that the robust formulations achieve accuracies comparable to that with true moments, even when moment estimates are erroneous. Results also illustrate the benefits of employing the proposed methodology for robust classification of large-scale datasets.
doi_str_mv	10.1007/s11081-011-9166-y
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1372655834</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1372655834</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-84b7c60bd9c40b0acbde73a3e9fc64d4239663b4c394439bb7c6b611ad3589ac3</originalsourceid><addsrcrecordid>eNp1kE1LxDAURYMoOH78AHcDbtxE85I0bZYy-AUDgug6JGk6dOg0Y167mH9val2I4CrJe-dewiHkCtgtMFbeIQCrgDIAqkEpejgiCyhKQbnm8jjfRaWplJydkjPELWOgCl4tyOotuhGHZRPTbuzs0MYep8fSd3kcUttvqLMY6mVn0yZQ9LYLeWkR26b134ELctLYDsPlz3lOPh4f3lfPdP369LK6X1MvpB5oJV3pFXO19pI5Zr2rQymsCLrxStaSC62UcNILLaXQbqKdArC1KCptvTgnN3PvPsXPMeBgdi360HW2D3FEA6LkqigqITN6_QfdxjH1-XeZUlxwBYJlCmbKp4iYQmP2qd3ZdDDAzKTVzFpN1momreaQM3zO4H6SE9Kv5n9DX83weyA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1362326130</pqid></control><display><type>article</type><title>Robust formulations for clustering-based large-scale classification</title><source>Springer Nature - Complete Springer Journals</source><creator>Jagarlapudi, Saketha Nath ; Ben-Tal, Aharon ; Bhattacharyya, Chiranjib</creator><creatorcontrib>Jagarlapudi, Saketha Nath ; Ben-Tal, Aharon ; Bhattacharyya, Chiranjib</creatorcontrib><description>Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is proposed. The key idea is to identify high density regions or clusters from individual class conditional densities and then use a CCP formulation to learn a classifier on the clusters. The CCP formulation ensures that most of the data points in a cluster are correctly classified by employing a Chebyshev-inequality-based convex relaxation. This relaxation is heavily dependent on the second-order statistics. However, this formulation and in general such relaxations that depend on the second-order moments are susceptible to moment estimation errors. One of the contributions of the paper is to propose several formulations that are robust to such errors. In particular a generic way of making such formulations robust to moment estimation errors is illustrated using two novel confidence sets. An important contribution is to show that when either of the confidence sets is employed, for the special case of a spherical normal distribution of clusters, the robust variant of the formulation can be posed as a second-order cone program. Empirical results show that the robust formulations achieve accuracies comparable to that with true moments, even when moment estimates are erroneous. Results also illustrate the benefits of employing the proposed methodology for robust classification of large-scale datasets.</description><identifier>ISSN: 1389-4420</identifier><identifier>EISSN: 1573-2924</identifier><identifier>DOI: 10.1007/s11081-011-9166-y</identifier><language>eng</language><publisher>Boston: Springer US</publisher><subject>Chebyshev approximation ; Classification ; Classifiers ; Clusters ; Confidence ; Confidence intervals ; Control ; Engineering ; Environmental Management ; Errors ; Financial Engineering ; Formulations ; Mathematics ; Mathematics and Statistics ; Operations Research/Decision Theory ; Optimization ; Systems Theory</subject><ispartof>Optimization and engineering, 2013-06, Vol.14 (2), p.225-250</ispartof><rights>Springer Science+Business Media, LLC 2011</rights><rights>Springer Science+Business Media New York 2013</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-84b7c60bd9c40b0acbde73a3e9fc64d4239663b4c394439bb7c6b611ad3589ac3</citedby><cites>FETCH-LOGICAL-c349t-84b7c60bd9c40b0acbde73a3e9fc64d4239663b4c394439bb7c6b611ad3589ac3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11081-011-9166-y$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11081-011-9166-y$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Jagarlapudi, Saketha Nath</creatorcontrib><creatorcontrib>Ben-Tal, Aharon</creatorcontrib><creatorcontrib>Bhattacharyya, Chiranjib</creatorcontrib><title>Robust formulations for clustering-based large-scale classification</title><title>Optimization and engineering</title><addtitle>Optim Eng</addtitle><description>Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is proposed. The key idea is to identify high density regions or clusters from individual class conditional densities and then use a CCP formulation to learn a classifier on the clusters. The CCP formulation ensures that most of the data points in a cluster are correctly classified by employing a Chebyshev-inequality-based convex relaxation. This relaxation is heavily dependent on the second-order statistics. However, this formulation and in general such relaxations that depend on the second-order moments are susceptible to moment estimation errors. One of the contributions of the paper is to propose several formulations that are robust to such errors. In particular a generic way of making such formulations robust to moment estimation errors is illustrated using two novel confidence sets. An important contribution is to show that when either of the confidence sets is employed, for the special case of a spherical normal distribution of clusters, the robust variant of the formulation can be posed as a second-order cone program. Empirical results show that the robust formulations achieve accuracies comparable to that with true moments, even when moment estimates are erroneous. Results also illustrate the benefits of employing the proposed methodology for robust classification of large-scale datasets.</description><subject>Chebyshev approximation</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Clusters</subject><subject>Confidence</subject><subject>Confidence intervals</subject><subject>Control</subject><subject>Engineering</subject><subject>Environmental Management</subject><subject>Errors</subject><subject>Financial Engineering</subject><subject>Formulations</subject><subject>Mathematics</subject><subject>Mathematics and Statistics</subject><subject>Operations Research/Decision Theory</subject><subject>Optimization</subject><subject>Systems Theory</subject><issn>1389-4420</issn><issn>1573-2924</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNp1kE1LxDAURYMoOH78AHcDbtxE85I0bZYy-AUDgug6JGk6dOg0Y167mH9val2I4CrJe-dewiHkCtgtMFbeIQCrgDIAqkEpejgiCyhKQbnm8jjfRaWplJydkjPELWOgCl4tyOotuhGHZRPTbuzs0MYep8fSd3kcUttvqLMY6mVn0yZQ9LYLeWkR26b134ELctLYDsPlz3lOPh4f3lfPdP369LK6X1MvpB5oJV3pFXO19pI5Zr2rQymsCLrxStaSC62UcNILLaXQbqKdArC1KCptvTgnN3PvPsXPMeBgdi360HW2D3FEA6LkqigqITN6_QfdxjH1-XeZUlxwBYJlCmbKp4iYQmP2qd3ZdDDAzKTVzFpN1momreaQM3zO4H6SE9Kv5n9DX83weyA</recordid><startdate>20130601</startdate><enddate>20130601</enddate><creator>Jagarlapudi, Saketha Nath</creator><creator>Ben-Tal, Aharon</creator><creator>Bhattacharyya, Chiranjib</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>KR7</scope></search><sort><creationdate>20130601</creationdate><title>Robust formulations for clustering-based large-scale classification</title><author>Jagarlapudi, Saketha Nath ; Ben-Tal, Aharon ; Bhattacharyya, Chiranjib</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-84b7c60bd9c40b0acbde73a3e9fc64d4239663b4c394439bb7c6b611ad3589ac3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Chebyshev approximation</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Clusters</topic><topic>Confidence</topic><topic>Confidence intervals</topic><topic>Control</topic><topic>Engineering</topic><topic>Environmental Management</topic><topic>Errors</topic><topic>Financial Engineering</topic><topic>Formulations</topic><topic>Mathematics</topic><topic>Mathematics and Statistics</topic><topic>Operations Research/Decision Theory</topic><topic>Optimization</topic><topic>Systems Theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jagarlapudi, Saketha Nath</creatorcontrib><creatorcontrib>Ben-Tal, Aharon</creatorcontrib><creatorcontrib>Bhattacharyya, Chiranjib</creatorcontrib><collection>CrossRef</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Civil Engineering Abstracts</collection><jtitle>Optimization and engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jagarlapudi, Saketha Nath</au><au>Ben-Tal, Aharon</au><au>Bhattacharyya, Chiranjib</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust formulations for clustering-based large-scale classification</atitle><jtitle>Optimization and engineering</jtitle><stitle>Optim Eng</stitle><date>2013-06-01</date><risdate>2013</risdate><volume>14</volume><issue>2</issue><spage>225</spage><epage>250</epage><pages>225-250</pages><issn>1389-4420</issn><eissn>1573-2924</eissn><abstract>Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is proposed. The key idea is to identify high density regions or clusters from individual class conditional densities and then use a CCP formulation to learn a classifier on the clusters. The CCP formulation ensures that most of the data points in a cluster are correctly classified by employing a Chebyshev-inequality-based convex relaxation. This relaxation is heavily dependent on the second-order statistics. However, this formulation and in general such relaxations that depend on the second-order moments are susceptible to moment estimation errors. One of the contributions of the paper is to propose several formulations that are robust to such errors. In particular a generic way of making such formulations robust to moment estimation errors is illustrated using two novel confidence sets. An important contribution is to show that when either of the confidence sets is employed, for the special case of a spherical normal distribution of clusters, the robust variant of the formulation can be posed as a second-order cone program. Empirical results show that the robust formulations achieve accuracies comparable to that with true moments, even when moment estimates are erroneous. Results also illustrate the benefits of employing the proposed methodology for robust classification of large-scale datasets.</abstract><cop>Boston</cop><pub>Springer US</pub><doi>10.1007/s11081-011-9166-y</doi><tpages>26</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1389-4420
ispartof	Optimization and engineering, 2013-06, Vol.14 (2), p.225-250
issn	1389-4420 1573-2924
language	eng
recordid	cdi_proquest_miscellaneous_1372655834
source	Springer Nature - Complete Springer Journals
subjects	Chebyshev approximation Classification Classifiers Clusters Confidence Confidence intervals Control Engineering Environmental Management Errors Financial Engineering Formulations Mathematics Mathematics and Statistics Operations Research/Decision Theory Optimization Systems Theory
title	Robust formulations for clustering-based large-scale classification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T21%3A15%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20formulations%20for%20clustering-based%20large-scale%20classification&rft.jtitle=Optimization%20and%20engineering&rft.au=Jagarlapudi,%20Saketha%20Nath&rft.date=2013-06-01&rft.volume=14&rft.issue=2&rft.spage=225&rft.epage=250&rft.pages=225-250&rft.issn=1389-4420&rft.eissn=1573-2924&rft_id=info:doi/10.1007/s11081-011-9166-y&rft_dat=%3Cproquest_cross%3E1372655834%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1362326130&rft_id=info:pmid/&rfr_iscdi=true