Sequential minimal optimization in convex clustering repetitions

Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged duri...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Statistical analysis and data mining 2012-02, Vol.5 (1), p.70-89
1. Verfasser:	Takahashi, Rikiya
Format:	Artikel
Sprache:	eng
Schlagworte:	convex clustering empirical-Bayes method sequential minimal optimization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	89
container_issue	1
container_start_page	70
container_title	Statistical analysis and data mining
container_volume	5
creator	Takahashi, Rikiya
description	Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012
doi_str_mv	10.1002/sam.10146
format	Article
fullrecord	<record><control><sourceid>wiley_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1002_sam_10146</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>SAM10146</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1826-26c71cbcef437d354e4f7030e8281b3959d9a921b65717ae76dc644aa2f59a503</originalsourceid><addsrcrecordid>eNp1j8tOAjEUhhujiYgufIPZuhjpvTM7AQWNt0Q0LptSzpjqXLAdFHx6iyg7V-dPzvefnA-hY4JPCca0F0wVA-FyB3VIzmhKMkV3t1nyfXQQwivGQkaqg84m8L6AunWmTCpXuyrOZt66yn2Z1jV14urENvUHLBNbLkIL3tUviYc5tG69D4dorzBlgKPf2UVPo4vH4WV6cz--GvZvUksyKlMqrSJ2aqHgTM2Y4MALhRmGjGZkynKRz3KTUzKVQhFlQMmZlZwbQwuRG4FZF51s7lrfhOCh0HMfv_UrTbBeq-uorn_UI9vbsJ-uhNX_oJ70b_8a6abhouJy2zD-TUvFlNDPd2N9PRg8jMR5pin7Bmi-alw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Sequential minimal optimization in convex clustering repetitions</title><source>Wiley Online Library</source><creator>Takahashi, Rikiya</creator><creatorcontrib>Takahashi, Rikiya</creatorcontrib><description>Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012</description><identifier>ISSN: 1932-1864</identifier><identifier>EISSN: 1932-1872</identifier><identifier>DOI: 10.1002/sam.10146</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc., A Wiley Company</publisher><subject>convex clustering ; empirical-Bayes method ; sequential minimal optimization</subject><ispartof>Statistical analysis and data mining, 2012-02, Vol.5 (1), p.70-89</ispartof><rights>Copyright © 2011 Wiley Periodicals, Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1826-26c71cbcef437d354e4f7030e8281b3959d9a921b65717ae76dc644aa2f59a503</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fsam.10146$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fsam.10146$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27903,27904,45553,45554</link.rule.ids></links><search><creatorcontrib>Takahashi, Rikiya</creatorcontrib><title>Sequential minimal optimization in convex clustering repetitions</title><title>Statistical analysis and data mining</title><addtitle>Statistical Analy Data Mining</addtitle><description>Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012</description><subject>convex clustering</subject><subject>empirical-Bayes method</subject><subject>sequential minimal optimization</subject><issn>1932-1864</issn><issn>1932-1872</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNp1j8tOAjEUhhujiYgufIPZuhjpvTM7AQWNt0Q0LptSzpjqXLAdFHx6iyg7V-dPzvefnA-hY4JPCca0F0wVA-FyB3VIzmhKMkV3t1nyfXQQwivGQkaqg84m8L6AunWmTCpXuyrOZt66yn2Z1jV14urENvUHLBNbLkIL3tUviYc5tG69D4dorzBlgKPf2UVPo4vH4WV6cz--GvZvUksyKlMqrSJ2aqHgTM2Y4MALhRmGjGZkynKRz3KTUzKVQhFlQMmZlZwbQwuRG4FZF51s7lrfhOCh0HMfv_UrTbBeq-uorn_UI9vbsJ-uhNX_oJ70b_8a6abhouJy2zD-TUvFlNDPd2N9PRg8jMR5pin7Bmi-alw</recordid><startdate>201202</startdate><enddate>201202</enddate><creator>Takahashi, Rikiya</creator><general>Wiley Subscription Services, Inc., A Wiley Company</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>201202</creationdate><title>Sequential minimal optimization in convex clustering repetitions</title><author>Takahashi, Rikiya</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1826-26c71cbcef437d354e4f7030e8281b3959d9a921b65717ae76dc644aa2f59a503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>convex clustering</topic><topic>empirical-Bayes method</topic><topic>sequential minimal optimization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Takahashi, Rikiya</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><jtitle>Statistical analysis and data mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Takahashi, Rikiya</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sequential minimal optimization in convex clustering repetitions</atitle><jtitle>Statistical analysis and data mining</jtitle><addtitle>Statistical Analy Data Mining</addtitle><date>2012-02</date><risdate>2012</risdate><volume>5</volume><issue>1</issue><spage>70</spage><epage>89</epage><pages>70-89</pages><issn>1932-1864</issn><eissn>1932-1872</eissn><abstract>Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc., A Wiley Company</pub><doi>10.1002/sam.10146</doi><tpages>20</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1932-1864
ispartof	Statistical analysis and data mining, 2012-02, Vol.5 (1), p.70-89
issn	1932-1864 1932-1872
language	eng
recordid	cdi_crossref_primary_10_1002_sam_10146
source	Wiley Online Library
subjects	convex clustering empirical-Bayes method sequential minimal optimization
title	Sequential minimal optimization in convex clustering repetitions
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T14%3A41%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wiley_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sequential%20minimal%20optimization%20in%20convex%20clustering%20repetitions&rft.jtitle=Statistical%20analysis%20and%20data%20mining&rft.au=Takahashi,%20Rikiya&rft.date=2012-02&rft.volume=5&rft.issue=1&rft.spage=70&rft.epage=89&rft.pages=70-89&rft.issn=1932-1864&rft.eissn=1932-1872&rft_id=info:doi/10.1002/sam.10146&rft_dat=%3Cwiley_cross%3ESAM10146%3C/wiley_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true