Sequential minimal optimization in convex clustering repetitions

Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged duri...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Statistical analysis and data mining 2012-02, Vol.5 (1), p.70-89
1. Verfasser: Takahashi, Rikiya
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 89
container_issue 1
container_start_page 70
container_title Statistical analysis and data mining
container_volume 5
creator Takahashi, Rikiya
description Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012
doi_str_mv 10.1002/sam.10146
format Article
fullrecord <record><control><sourceid>wiley_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1002_sam_10146</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>SAM10146</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1826-26c71cbcef437d354e4f7030e8281b3959d9a921b65717ae76dc644aa2f59a503</originalsourceid><addsrcrecordid>eNp1j8tOAjEUhhujiYgufIPZuhjpvTM7AQWNt0Q0LptSzpjqXLAdFHx6iyg7V-dPzvefnA-hY4JPCca0F0wVA-FyB3VIzmhKMkV3t1nyfXQQwivGQkaqg84m8L6AunWmTCpXuyrOZt66yn2Z1jV14urENvUHLBNbLkIL3tUviYc5tG69D4dorzBlgKPf2UVPo4vH4WV6cz--GvZvUksyKlMqrSJ2aqHgTM2Y4MALhRmGjGZkynKRz3KTUzKVQhFlQMmZlZwbQwuRG4FZF51s7lrfhOCh0HMfv_UrTbBeq-uorn_UI9vbsJ-uhNX_oJ70b_8a6abhouJy2zD-TUvFlNDPd2N9PRg8jMR5pin7Bmi-alw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Sequential minimal optimization in convex clustering repetitions</title><source>Wiley Online Library</source><creator>Takahashi, Rikiya</creator><creatorcontrib>Takahashi, Rikiya</creatorcontrib><description>Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012</description><identifier>ISSN: 1932-1864</identifier><identifier>EISSN: 1932-1872</identifier><identifier>DOI: 10.1002/sam.10146</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc., A Wiley Company</publisher><subject>convex clustering ; empirical-Bayes method ; sequential minimal optimization</subject><ispartof>Statistical analysis and data mining, 2012-02, Vol.5 (1), p.70-89</ispartof><rights>Copyright © 2011 Wiley Periodicals, Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1826-26c71cbcef437d354e4f7030e8281b3959d9a921b65717ae76dc644aa2f59a503</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fsam.10146$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fsam.10146$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27903,27904,45553,45554</link.rule.ids></links><search><creatorcontrib>Takahashi, Rikiya</creatorcontrib><title>Sequential minimal optimization in convex clustering repetitions</title><title>Statistical analysis and data mining</title><addtitle>Statistical Analy Data Mining</addtitle><description>Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012</description><subject>convex clustering</subject><subject>empirical-Bayes method</subject><subject>sequential minimal optimization</subject><issn>1932-1864</issn><issn>1932-1872</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNp1j8tOAjEUhhujiYgufIPZuhjpvTM7AQWNt0Q0LptSzpjqXLAdFHx6iyg7V-dPzvefnA-hY4JPCca0F0wVA-FyB3VIzmhKMkV3t1nyfXQQwivGQkaqg84m8L6AunWmTCpXuyrOZt66yn2Z1jV14urENvUHLBNbLkIL3tUviYc5tG69D4dorzBlgKPf2UVPo4vH4WV6cz--GvZvUksyKlMqrSJ2aqHgTM2Y4MALhRmGjGZkynKRz3KTUzKVQhFlQMmZlZwbQwuRG4FZF51s7lrfhOCh0HMfv_UrTbBeq-uorn_UI9vbsJ-uhNX_oJ70b_8a6abhouJy2zD-TUvFlNDPd2N9PRg8jMR5pin7Bmi-alw</recordid><startdate>201202</startdate><enddate>201202</enddate><creator>Takahashi, Rikiya</creator><general>Wiley Subscription Services, Inc., A Wiley Company</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>201202</creationdate><title>Sequential minimal optimization in convex clustering repetitions</title><author>Takahashi, Rikiya</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1826-26c71cbcef437d354e4f7030e8281b3959d9a921b65717ae76dc644aa2f59a503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>convex clustering</topic><topic>empirical-Bayes method</topic><topic>sequential minimal optimization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Takahashi, Rikiya</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><jtitle>Statistical analysis and data mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Takahashi, Rikiya</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sequential minimal optimization in convex clustering repetitions</atitle><jtitle>Statistical analysis and data mining</jtitle><addtitle>Statistical Analy Data Mining</addtitle><date>2012-02</date><risdate>2012</risdate><volume>5</volume><issue>1</issue><spage>70</spage><epage>89</epage><pages>70-89</pages><issn>1932-1864</issn><eissn>1932-1872</eissn><abstract>Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc., A Wiley Company</pub><doi>10.1002/sam.10146</doi><tpages>20</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1932-1864
ispartof Statistical analysis and data mining, 2012-02, Vol.5 (1), p.70-89
issn 1932-1864
1932-1872
language eng
recordid cdi_crossref_primary_10_1002_sam_10146
source Wiley Online Library
subjects convex clustering
empirical-Bayes method
sequential minimal optimization
title Sequential minimal optimization in convex clustering repetitions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T14%3A41%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wiley_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sequential%20minimal%20optimization%20in%20convex%20clustering%20repetitions&rft.jtitle=Statistical%20analysis%20and%20data%20mining&rft.au=Takahashi,%20Rikiya&rft.date=2012-02&rft.volume=5&rft.issue=1&rft.spage=70&rft.epage=89&rft.pages=70-89&rft.issn=1932-1864&rft.eissn=1932-1872&rft_id=info:doi/10.1002/sam.10146&rft_dat=%3Cwiley_cross%3ESAM10146%3C/wiley_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true