Accelerating EM clustering to find high-quality solutions

Clustering is one of the most important techniques used in data mining. This article focuses on the EM clustering algorithm. Two fundamental aspects are studied: achieving faster convergence and finding higher quality clustering solutions. This work introduces several improvements to the EM clusteri...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge and information systems 2005-02, Vol.7 (2), p.135-157
Hauptverfasser:	Ordonez, Carlos, Omiecinski, Edward
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Cluster analysis Clustering Clusters Convergence Data mining Datasets Information systems Parameters Probability Quality control Splitting Thresholds
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	157
container_issue	2
container_start_page	135
container_title	Knowledge and information systems
container_volume	7
creator	Ordonez, Carlos Omiecinski, Edward
description	Clustering is one of the most important techniques used in data mining. This article focuses on the EM clustering algorithm. Two fundamental aspects are studied: achieving faster convergence and finding higher quality clustering solutions. This work introduces several improvements to the EM clustering algorithm, being periodic M steps during initial iterations, reseeding of low-weight clusters and splitting of high-weight clusters the most important. These improvements lead to two important parameters. The first parameter is the number of M steps per iteration and the second one, a weight threshold to reseed low-weight clusters. Experiments show how frequently the M step must be executed and what weight threshold values make EM reach higher quality solutions. In general, the improved EM clustering algorithm finds higher quality solutions than the classical EM algorithm and converges in fewer iterations. [PUBLICATION ABSTRACT]
doi_str_mv	10.1007/s10115-003-0141-6
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1835601859</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>786542721</sourcerecordid><originalsourceid>FETCH-LOGICAL-c305t-fa8f2bc8ce03dab74f71cc9606a2cdec19671e87d55332f34712cf1634a28c4c3</originalsourceid><addsrcrecordid>eNpdkD1PwzAURS0EEqXwA9giJhbDe3ZiJ2NVlQ-piAVmy3Xs1lUat7Yz9N-Tqp2Y7rvS0dPVIeQR4QUB5GtCQKwoAKeAJVJxRSbAsKEcUVxfbuRS3pK7lLYAKAXihDQzY2xno86-XxeLr8J0Q8o2nloOhfN9W2z8ekMPg-58PhYpdEP2oU_35MbpLtmHS07J79viZ_5Bl9_vn_PZkhoOVaZO146tTG0s8FavZOkkGtMIEJqZ1hpshERby7aqOGeOlxKZcSh4qVltSsOn5Pn8dx_DYbApq51P4-ZO9zYMSWHNKwFYV82IPv1Dt2GI_bhOMSj5KGeEpwTPkIkhpWid2ke_0_GoENTJpTq7VCOuTi6V4H9cvWXK</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>204300318</pqid></control><display><type>article</type><title>Accelerating EM clustering to find high-quality solutions</title><source>Springer Nature - Complete Springer Journals</source><creator>Ordonez, Carlos ; Omiecinski, Edward</creator><creatorcontrib>Ordonez, Carlos ; Omiecinski, Edward</creatorcontrib><description>Clustering is one of the most important techniques used in data mining. This article focuses on the EM clustering algorithm. Two fundamental aspects are studied: achieving faster convergence and finding higher quality clustering solutions. This work introduces several improvements to the EM clustering algorithm, being periodic M steps during initial iterations, reseeding of low-weight clusters and splitting of high-weight clusters the most important. These improvements lead to two important parameters. The first parameter is the number of M steps per iteration and the second one, a weight threshold to reseed low-weight clusters. Experiments show how frequently the M step must be executed and what weight threshold values make EM reach higher quality solutions. In general, the improved EM clustering algorithm finds higher quality solutions than the classical EM algorithm and converges in fewer iterations. [PUBLICATION ABSTRACT]</description><identifier>ISSN: 0219-1377</identifier><identifier>EISSN: 0219-3116</identifier><identifier>DOI: 10.1007/s10115-003-0141-6</identifier><identifier>CODEN: KISNCR</identifier><language>eng</language><publisher>London: Springer Nature B.V</publisher><subject>Algorithms ; Cluster analysis ; Clustering ; Clusters ; Convergence ; Data mining ; Datasets ; Information systems ; Parameters ; Probability ; Quality control ; Splitting ; Thresholds</subject><ispartof>Knowledge and information systems, 2005-02, Vol.7 (2), p.135-157</ispartof><rights>Springer-Verlag 2005</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c305t-fa8f2bc8ce03dab74f71cc9606a2cdec19671e87d55332f34712cf1634a28c4c3</citedby><cites>FETCH-LOGICAL-c305t-fa8f2bc8ce03dab74f71cc9606a2cdec19671e87d55332f34712cf1634a28c4c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Ordonez, Carlos</creatorcontrib><creatorcontrib>Omiecinski, Edward</creatorcontrib><title>Accelerating EM clustering to find high-quality solutions</title><title>Knowledge and information systems</title><description>Clustering is one of the most important techniques used in data mining. This article focuses on the EM clustering algorithm. Two fundamental aspects are studied: achieving faster convergence and finding higher quality clustering solutions. This work introduces several improvements to the EM clustering algorithm, being periodic M steps during initial iterations, reseeding of low-weight clusters and splitting of high-weight clusters the most important. These improvements lead to two important parameters. The first parameter is the number of M steps per iteration and the second one, a weight threshold to reseed low-weight clusters. Experiments show how frequently the M step must be executed and what weight threshold values make EM reach higher quality solutions. In general, the improved EM clustering algorithm finds higher quality solutions than the classical EM algorithm and converges in fewer iterations. [PUBLICATION ABSTRACT]</description><subject>Algorithms</subject><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Clusters</subject><subject>Convergence</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Information systems</subject><subject>Parameters</subject><subject>Probability</subject><subject>Quality control</subject><subject>Splitting</subject><subject>Thresholds</subject><issn>0219-1377</issn><issn>0219-3116</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpdkD1PwzAURS0EEqXwA9giJhbDe3ZiJ2NVlQ-piAVmy3Xs1lUat7Yz9N-Tqp2Y7rvS0dPVIeQR4QUB5GtCQKwoAKeAJVJxRSbAsKEcUVxfbuRS3pK7lLYAKAXihDQzY2xno86-XxeLr8J0Q8o2nloOhfN9W2z8ekMPg-58PhYpdEP2oU_35MbpLtmHS07J79viZ_5Bl9_vn_PZkhoOVaZO146tTG0s8FavZOkkGtMIEJqZ1hpshERby7aqOGeOlxKZcSh4qVltSsOn5Pn8dx_DYbApq51P4-ZO9zYMSWHNKwFYV82IPv1Dt2GI_bhOMSj5KGeEpwTPkIkhpWid2ke_0_GoENTJpTq7VCOuTi6V4H9cvWXK</recordid><startdate>20050201</startdate><enddate>20050201</enddate><creator>Ordonez, Carlos</creator><creator>Omiecinski, Edward</creator><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0U~</scope><scope>1-H</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L.0</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20050201</creationdate><title>Accelerating EM clustering to find high-quality solutions</title><author>Ordonez, Carlos ; Omiecinski, Edward</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c305t-fa8f2bc8ce03dab74f71cc9606a2cdec19671e87d55332f34712cf1634a28c4c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Algorithms</topic><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Clusters</topic><topic>Convergence</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Information systems</topic><topic>Parameters</topic><topic>Probability</topic><topic>Quality control</topic><topic>Splitting</topic><topic>Thresholds</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ordonez, Carlos</creatorcontrib><creatorcontrib>Omiecinski, Edward</creatorcontrib><collection>CrossRef</collection><collection>Global News & ABI/Inform Professional</collection><collection>Trade PRO</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM Professional Standard</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Knowledge and information systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ordonez, Carlos</au><au>Omiecinski, Edward</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Accelerating EM clustering to find high-quality solutions</atitle><jtitle>Knowledge and information systems</jtitle><date>2005-02-01</date><risdate>2005</risdate><volume>7</volume><issue>2</issue><spage>135</spage><epage>157</epage><pages>135-157</pages><issn>0219-1377</issn><eissn>0219-3116</eissn><coden>KISNCR</coden><abstract>Clustering is one of the most important techniques used in data mining. This article focuses on the EM clustering algorithm. Two fundamental aspects are studied: achieving faster convergence and finding higher quality clustering solutions. This work introduces several improvements to the EM clustering algorithm, being periodic M steps during initial iterations, reseeding of low-weight clusters and splitting of high-weight clusters the most important. These improvements lead to two important parameters. The first parameter is the number of M steps per iteration and the second one, a weight threshold to reseed low-weight clusters. Experiments show how frequently the M step must be executed and what weight threshold values make EM reach higher quality solutions. In general, the improved EM clustering algorithm finds higher quality solutions than the classical EM algorithm and converges in fewer iterations. [PUBLICATION ABSTRACT]</abstract><cop>London</cop><pub>Springer Nature B.V</pub><doi>10.1007/s10115-003-0141-6</doi><tpages>23</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0219-1377
ispartof	Knowledge and information systems, 2005-02, Vol.7 (2), p.135-157
issn	0219-1377 0219-3116
language	eng
recordid	cdi_proquest_miscellaneous_1835601859
source	Springer Nature - Complete Springer Journals
subjects	Algorithms Cluster analysis Clustering Clusters Convergence Data mining Datasets Information systems Parameters Probability Quality control Splitting Thresholds
title	Accelerating EM clustering to find high-quality solutions
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T10%3A35%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Accelerating%20EM%20clustering%20to%20find%20high-quality%20solutions&rft.jtitle=Knowledge%20and%20information%20systems&rft.au=Ordonez,%20Carlos&rft.date=2005-02-01&rft.volume=7&rft.issue=2&rft.spage=135&rft.epage=157&rft.pages=135-157&rft.issn=0219-1377&rft.eissn=0219-3116&rft.coden=KISNCR&rft_id=info:doi/10.1007/s10115-003-0141-6&rft_dat=%3Cproquest_cross%3E786542721%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=204300318&rft_id=info:pmid/&rfr_iscdi=true