Sequential minimal optimization in convex clustering repetitions
Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged duri...
Gespeichert in:
Veröffentlicht in: | Statistical analysis and data mining 2012-02, Vol.5 (1), p.70-89 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 89 |
---|---|
container_issue | 1 |
container_start_page | 70 |
container_title | Statistical analysis and data mining |
container_volume | 5 |
creator | Takahashi, Rikiya |
description | Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012 |
doi_str_mv | 10.1002/sam.10146 |
format | Article |
fullrecord | <record><control><sourceid>wiley_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1002_sam_10146</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>SAM10146</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1826-26c71cbcef437d354e4f7030e8281b3959d9a921b65717ae76dc644aa2f59a503</originalsourceid><addsrcrecordid>eNp1j8tOAjEUhhujiYgufIPZuhjpvTM7AQWNt0Q0LptSzpjqXLAdFHx6iyg7V-dPzvefnA-hY4JPCca0F0wVA-FyB3VIzmhKMkV3t1nyfXQQwivGQkaqg84m8L6AunWmTCpXuyrOZt66yn2Z1jV14urENvUHLBNbLkIL3tUviYc5tG69D4dorzBlgKPf2UVPo4vH4WV6cz--GvZvUksyKlMqrSJ2aqHgTM2Y4MALhRmGjGZkynKRz3KTUzKVQhFlQMmZlZwbQwuRG4FZF51s7lrfhOCh0HMfv_UrTbBeq-uorn_UI9vbsJ-uhNX_oJ70b_8a6abhouJy2zD-TUvFlNDPd2N9PRg8jMR5pin7Bmi-alw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Sequential minimal optimization in convex clustering repetitions</title><source>Wiley Online Library</source><creator>Takahashi, Rikiya</creator><creatorcontrib>Takahashi, Rikiya</creatorcontrib><description>Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012</description><identifier>ISSN: 1932-1864</identifier><identifier>EISSN: 1932-1872</identifier><identifier>DOI: 10.1002/sam.10146</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc., A Wiley Company</publisher><subject>convex clustering ; empirical-Bayes method ; sequential minimal optimization</subject><ispartof>Statistical analysis and data mining, 2012-02, Vol.5 (1), p.70-89</ispartof><rights>Copyright © 2011 Wiley Periodicals, Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1826-26c71cbcef437d354e4f7030e8281b3959d9a921b65717ae76dc644aa2f59a503</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fsam.10146$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fsam.10146$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27903,27904,45553,45554</link.rule.ids></links><search><creatorcontrib>Takahashi, Rikiya</creatorcontrib><title>Sequential minimal optimization in convex clustering repetitions</title><title>Statistical analysis and data mining</title><addtitle>Statistical Analy Data Mining</addtitle><description>Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012</description><subject>convex clustering</subject><subject>empirical-Bayes method</subject><subject>sequential minimal optimization</subject><issn>1932-1864</issn><issn>1932-1872</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNp1j8tOAjEUhhujiYgufIPZuhjpvTM7AQWNt0Q0LptSzpjqXLAdFHx6iyg7V-dPzvefnA-hY4JPCca0F0wVA-FyB3VIzmhKMkV3t1nyfXQQwivGQkaqg84m8L6AunWmTCpXuyrOZt66yn2Z1jV14urENvUHLBNbLkIL3tUviYc5tG69D4dorzBlgKPf2UVPo4vH4WV6cz--GvZvUksyKlMqrSJ2aqHgTM2Y4MALhRmGjGZkynKRz3KTUzKVQhFlQMmZlZwbQwuRG4FZF51s7lrfhOCh0HMfv_UrTbBeq-uorn_UI9vbsJ-uhNX_oJ70b_8a6abhouJy2zD-TUvFlNDPd2N9PRg8jMR5pin7Bmi-alw</recordid><startdate>201202</startdate><enddate>201202</enddate><creator>Takahashi, Rikiya</creator><general>Wiley Subscription Services, Inc., A Wiley Company</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>201202</creationdate><title>Sequential minimal optimization in convex clustering repetitions</title><author>Takahashi, Rikiya</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1826-26c71cbcef437d354e4f7030e8281b3959d9a921b65717ae76dc644aa2f59a503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>convex clustering</topic><topic>empirical-Bayes method</topic><topic>sequential minimal optimization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Takahashi, Rikiya</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><jtitle>Statistical analysis and data mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Takahashi, Rikiya</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sequential minimal optimization in convex clustering repetitions</atitle><jtitle>Statistical analysis and data mining</jtitle><addtitle>Statistical Analy Data Mining</addtitle><date>2012-02</date><risdate>2012</risdate><volume>5</volume><issue>1</issue><spage>70</spage><epage>89</epage><pages>70-89</pages><issn>1932-1864</issn><eissn>1932-1872</eissn><abstract>Computing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc., A Wiley Company</pub><doi>10.1002/sam.10146</doi><tpages>20</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1932-1864 |
ispartof | Statistical analysis and data mining, 2012-02, Vol.5 (1), p.70-89 |
issn | 1932-1864 1932-1872 |
language | eng |
recordid | cdi_crossref_primary_10_1002_sam_10146 |
source | Wiley Online Library |
subjects | convex clustering empirical-Bayes method sequential minimal optimization |
title | Sequential minimal optimization in convex clustering repetitions |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T14%3A41%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wiley_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sequential%20minimal%20optimization%20in%20convex%20clustering%20repetitions&rft.jtitle=Statistical%20analysis%20and%20data%20mining&rft.au=Takahashi,%20Rikiya&rft.date=2012-02&rft.volume=5&rft.issue=1&rft.spage=70&rft.epage=89&rft.pages=70-89&rft.issn=1932-1864&rft.eissn=1932-1872&rft_id=info:doi/10.1002/sam.10146&rft_dat=%3Cwiley_cross%3ESAM10146%3C/wiley_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |