Size Constrained Clustering With MILP Formulation

Clustering is one of the essential tools for data mining since it reveals the natural structures of the unlabeled data. Many clustering algorithms have been proposed in the last decades. However, few of them are designed to adapt prior knowledge that is available in many real applications, such as t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020, Vol.8, p.1587-1599
Hauptverfasser: Tang, Wei, Yang, Yang, Zeng, Lanling, Zhan, Yongzhao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1599
container_issue
container_start_page 1587
container_title IEEE access
container_volume 8
creator Tang, Wei
Yang, Yang
Zeng, Lanling
Zhan, Yongzhao
description Clustering is one of the essential tools for data mining since it reveals the natural structures of the unlabeled data. Many clustering algorithms have been proposed in the last decades. However, few of them are designed to adapt prior knowledge that is available in many real applications, such as the sizes of clusters. In this paper, we propose a novel iterative clustering algorithm that can impose the constraints on the sizes of clusters. Given an unordered set of cluster size constraints, the proposed method minimizes the mean squared error (MSE) while simultaneously considers the size constraints. Each iteration of the proposed method consists of two steps, namely an assignment step and an update step. In the assignment step, the observations are assigned into clusters under the size constraints. The assignment task is modeled as an integer linear programming (ILP) problem. We prove that part of the constraint matrix of this ILP problem is total unimodular. Therefore, the integer constraints on most of the variables can be omitted so that the problem would become a mixed integer programming (MILP) problem which is much easier to solve. In the update step, new cluster centroids will be updated as the centers of the observations in the corresponding clusters. Experiments on UCI data sets indicate that (1) imposing the size constraints as proposed could improve the clustering performance; (2) compared with the state-of-the-art size constrained clustering methods, the proposed method could efficiently derive better clustering results.
doi_str_mv 10.1109/ACCESS.2019.2962191
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2019_2962191</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8943118</ieee_id><doaj_id>oai_doaj_org_article_c2d51565410f4c698de2781f184949eb</doaj_id><sourcerecordid>2454716169</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-f339749e9a4ec0dd4166569e921b22c1acf0c0eb66d9594790520e8e0d01ad03</originalsourceid><addsrcrecordid>eNpNkE1Lw0AQhoMoWGp_QS8Bz6k7-5XssYRWCxWFFjwu291N3ZJm6yY56K93a0pxLvPBvO8MT5JMAc0AkHial-Vis5lhBGKGBccg4CYZYeAiI4zw23_1fTJp2wOKUcQRy0cJbNyPTUvftF1QrrEmLeu-7WxwzT79cN1n-rpav6dLH459rTrnm4fkrlJ1ayeXPE62y8W2fMnWb8-rcr7ONEVFl1WEiJwKKxS1GhlDgXPGY49hh7EGpSukkd1xbgQTNBeIYWQLiwwCZRAZJ6vB1nh1kKfgjip8S6-c_Bv4sJcqdE7XVmpsGDDOKKCKai4KY3FeQAUFFfGDXfR6HLxOwX_1tu3kwfehid9LTBnNgUcacYsMWzr4tg22ul4FJM-k5UBanknLC-momg4qZ629KgpBCUBBfgHsCXcI</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2454716169</pqid></control><display><type>article</type><title>Size Constrained Clustering With MILP Formulation</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Tang, Wei ; Yang, Yang ; Zeng, Lanling ; Zhan, Yongzhao</creator><creatorcontrib>Tang, Wei ; Yang, Yang ; Zeng, Lanling ; Zhan, Yongzhao</creatorcontrib><description>Clustering is one of the essential tools for data mining since it reveals the natural structures of the unlabeled data. Many clustering algorithms have been proposed in the last decades. However, few of them are designed to adapt prior knowledge that is available in many real applications, such as the sizes of clusters. In this paper, we propose a novel iterative clustering algorithm that can impose the constraints on the sizes of clusters. Given an unordered set of cluster size constraints, the proposed method minimizes the mean squared error (MSE) while simultaneously considers the size constraints. Each iteration of the proposed method consists of two steps, namely an assignment step and an update step. In the assignment step, the observations are assigned into clusters under the size constraints. The assignment task is modeled as an integer linear programming (ILP) problem. We prove that part of the constraint matrix of this ILP problem is total unimodular. Therefore, the integer constraints on most of the variables can be omitted so that the problem would become a mixed integer programming (MILP) problem which is much easier to solve. In the update step, new cluster centroids will be updated as the centers of the observations in the corresponding clusters. Experiments on UCI data sets indicate that (1) imposing the size constraints as proposed could improve the clustering performance; (2) compared with the state-of-the-art size constrained clustering methods, the proposed method could efficiently derive better clustering results.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2019.2962191</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Centroids ; Clustering ; Clustering algorithms ; Clustering methods ; Constraint modelling ; Data mining ; Indexes ; Integer programming ; Iterative methods ; linear program ; Linear programming ; Matrix methods ; mean squared error ; Mixed integer ; Partitioning algorithms ; size constraints ; Task analysis</subject><ispartof>IEEE access, 2020, Vol.8, p.1587-1599</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-f339749e9a4ec0dd4166569e921b22c1acf0c0eb66d9594790520e8e0d01ad03</citedby><cites>FETCH-LOGICAL-c408t-f339749e9a4ec0dd4166569e921b22c1acf0c0eb66d9594790520e8e0d01ad03</cites><orcidid>0000-0002-7727-5649 ; 0000-0001-7475-2895 ; 0000-0001-8782-4819 ; 0000-0003-3414-2421</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8943118$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>315,781,785,865,2103,4025,27635,27925,27926,27927,54935</link.rule.ids></links><search><creatorcontrib>Tang, Wei</creatorcontrib><creatorcontrib>Yang, Yang</creatorcontrib><creatorcontrib>Zeng, Lanling</creatorcontrib><creatorcontrib>Zhan, Yongzhao</creatorcontrib><title>Size Constrained Clustering With MILP Formulation</title><title>IEEE access</title><addtitle>Access</addtitle><description>Clustering is one of the essential tools for data mining since it reveals the natural structures of the unlabeled data. Many clustering algorithms have been proposed in the last decades. However, few of them are designed to adapt prior knowledge that is available in many real applications, such as the sizes of clusters. In this paper, we propose a novel iterative clustering algorithm that can impose the constraints on the sizes of clusters. Given an unordered set of cluster size constraints, the proposed method minimizes the mean squared error (MSE) while simultaneously considers the size constraints. Each iteration of the proposed method consists of two steps, namely an assignment step and an update step. In the assignment step, the observations are assigned into clusters under the size constraints. The assignment task is modeled as an integer linear programming (ILP) problem. We prove that part of the constraint matrix of this ILP problem is total unimodular. Therefore, the integer constraints on most of the variables can be omitted so that the problem would become a mixed integer programming (MILP) problem which is much easier to solve. In the update step, new cluster centroids will be updated as the centers of the observations in the corresponding clusters. Experiments on UCI data sets indicate that (1) imposing the size constraints as proposed could improve the clustering performance; (2) compared with the state-of-the-art size constrained clustering methods, the proposed method could efficiently derive better clustering results.</description><subject>Algorithms</subject><subject>Centroids</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Clustering methods</subject><subject>Constraint modelling</subject><subject>Data mining</subject><subject>Indexes</subject><subject>Integer programming</subject><subject>Iterative methods</subject><subject>linear program</subject><subject>Linear programming</subject><subject>Matrix methods</subject><subject>mean squared error</subject><subject>Mixed integer</subject><subject>Partitioning algorithms</subject><subject>size constraints</subject><subject>Task analysis</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkE1Lw0AQhoMoWGp_QS8Bz6k7-5XssYRWCxWFFjwu291N3ZJm6yY56K93a0pxLvPBvO8MT5JMAc0AkHial-Vis5lhBGKGBccg4CYZYeAiI4zw23_1fTJp2wOKUcQRy0cJbNyPTUvftF1QrrEmLeu-7WxwzT79cN1n-rpav6dLH459rTrnm4fkrlJ1ayeXPE62y8W2fMnWb8-rcr7ONEVFl1WEiJwKKxS1GhlDgXPGY49hh7EGpSukkd1xbgQTNBeIYWQLiwwCZRAZJ6vB1nh1kKfgjip8S6-c_Bv4sJcqdE7XVmpsGDDOKKCKai4KY3FeQAUFFfGDXfR6HLxOwX_1tu3kwfehid9LTBnNgUcacYsMWzr4tg22ul4FJM-k5UBanknLC-momg4qZ629KgpBCUBBfgHsCXcI</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Tang, Wei</creator><creator>Yang, Yang</creator><creator>Zeng, Lanling</creator><creator>Zhan, Yongzhao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-7727-5649</orcidid><orcidid>https://orcid.org/0000-0001-7475-2895</orcidid><orcidid>https://orcid.org/0000-0001-8782-4819</orcidid><orcidid>https://orcid.org/0000-0003-3414-2421</orcidid></search><sort><creationdate>2020</creationdate><title>Size Constrained Clustering With MILP Formulation</title><author>Tang, Wei ; Yang, Yang ; Zeng, Lanling ; Zhan, Yongzhao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-f339749e9a4ec0dd4166569e921b22c1acf0c0eb66d9594790520e8e0d01ad03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Centroids</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Clustering methods</topic><topic>Constraint modelling</topic><topic>Data mining</topic><topic>Indexes</topic><topic>Integer programming</topic><topic>Iterative methods</topic><topic>linear program</topic><topic>Linear programming</topic><topic>Matrix methods</topic><topic>mean squared error</topic><topic>Mixed integer</topic><topic>Partitioning algorithms</topic><topic>size constraints</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tang, Wei</creatorcontrib><creatorcontrib>Yang, Yang</creatorcontrib><creatorcontrib>Zeng, Lanling</creatorcontrib><creatorcontrib>Zhan, Yongzhao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tang, Wei</au><au>Yang, Yang</au><au>Zeng, Lanling</au><au>Zhan, Yongzhao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Size Constrained Clustering With MILP Formulation</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2020</date><risdate>2020</risdate><volume>8</volume><spage>1587</spage><epage>1599</epage><pages>1587-1599</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Clustering is one of the essential tools for data mining since it reveals the natural structures of the unlabeled data. Many clustering algorithms have been proposed in the last decades. However, few of them are designed to adapt prior knowledge that is available in many real applications, such as the sizes of clusters. In this paper, we propose a novel iterative clustering algorithm that can impose the constraints on the sizes of clusters. Given an unordered set of cluster size constraints, the proposed method minimizes the mean squared error (MSE) while simultaneously considers the size constraints. Each iteration of the proposed method consists of two steps, namely an assignment step and an update step. In the assignment step, the observations are assigned into clusters under the size constraints. The assignment task is modeled as an integer linear programming (ILP) problem. We prove that part of the constraint matrix of this ILP problem is total unimodular. Therefore, the integer constraints on most of the variables can be omitted so that the problem would become a mixed integer programming (MILP) problem which is much easier to solve. In the update step, new cluster centroids will be updated as the centers of the observations in the corresponding clusters. Experiments on UCI data sets indicate that (1) imposing the size constraints as proposed could improve the clustering performance; (2) compared with the state-of-the-art size constrained clustering methods, the proposed method could efficiently derive better clustering results.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2019.2962191</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-7727-5649</orcidid><orcidid>https://orcid.org/0000-0001-7475-2895</orcidid><orcidid>https://orcid.org/0000-0001-8782-4819</orcidid><orcidid>https://orcid.org/0000-0003-3414-2421</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2020, Vol.8, p.1587-1599
issn 2169-3536
2169-3536
language eng
recordid cdi_crossref_primary_10_1109_ACCESS_2019_2962191
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Algorithms
Centroids
Clustering
Clustering algorithms
Clustering methods
Constraint modelling
Data mining
Indexes
Integer programming
Iterative methods
linear program
Linear programming
Matrix methods
mean squared error
Mixed integer
Partitioning algorithms
size constraints
Task analysis
title Size Constrained Clustering With MILP Formulation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T08%3A00%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Size%20Constrained%20Clustering%20With%20MILP%20Formulation&rft.jtitle=IEEE%20access&rft.au=Tang,%20Wei&rft.date=2020&rft.volume=8&rft.spage=1587&rft.epage=1599&rft.pages=1587-1599&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2019.2962191&rft_dat=%3Cproquest_cross%3E2454716169%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2454716169&rft_id=info:pmid/&rft_ieee_id=8943118&rft_doaj_id=oai_doaj_org_article_c2d51565410f4c698de2781f184949eb&rfr_iscdi=true