Size Constrained Clustering With MILP Formulation
Clustering is one of the essential tools for data mining since it reveals the natural structures of the unlabeled data. Many clustering algorithms have been proposed in the last decades. However, few of them are designed to adapt prior knowledge that is available in many real applications, such as t...
Gespeichert in:
Veröffentlicht in: | IEEE access 2020, Vol.8, p.1587-1599 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1599 |
---|---|
container_issue | |
container_start_page | 1587 |
container_title | IEEE access |
container_volume | 8 |
creator | Tang, Wei Yang, Yang Zeng, Lanling Zhan, Yongzhao |
description | Clustering is one of the essential tools for data mining since it reveals the natural structures of the unlabeled data. Many clustering algorithms have been proposed in the last decades. However, few of them are designed to adapt prior knowledge that is available in many real applications, such as the sizes of clusters. In this paper, we propose a novel iterative clustering algorithm that can impose the constraints on the sizes of clusters. Given an unordered set of cluster size constraints, the proposed method minimizes the mean squared error (MSE) while simultaneously considers the size constraints. Each iteration of the proposed method consists of two steps, namely an assignment step and an update step. In the assignment step, the observations are assigned into clusters under the size constraints. The assignment task is modeled as an integer linear programming (ILP) problem. We prove that part of the constraint matrix of this ILP problem is total unimodular. Therefore, the integer constraints on most of the variables can be omitted so that the problem would become a mixed integer programming (MILP) problem which is much easier to solve. In the update step, new cluster centroids will be updated as the centers of the observations in the corresponding clusters. Experiments on UCI data sets indicate that (1) imposing the size constraints as proposed could improve the clustering performance; (2) compared with the state-of-the-art size constrained clustering methods, the proposed method could efficiently derive better clustering results. |
doi_str_mv | 10.1109/ACCESS.2019.2962191 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2019_2962191</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8943118</ieee_id><doaj_id>oai_doaj_org_article_c2d51565410f4c698de2781f184949eb</doaj_id><sourcerecordid>2454716169</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-f339749e9a4ec0dd4166569e921b22c1acf0c0eb66d9594790520e8e0d01ad03</originalsourceid><addsrcrecordid>eNpNkE1Lw0AQhoMoWGp_QS8Bz6k7-5XssYRWCxWFFjwu291N3ZJm6yY56K93a0pxLvPBvO8MT5JMAc0AkHial-Vis5lhBGKGBccg4CYZYeAiI4zw23_1fTJp2wOKUcQRy0cJbNyPTUvftF1QrrEmLeu-7WxwzT79cN1n-rpav6dLH459rTrnm4fkrlJ1ayeXPE62y8W2fMnWb8-rcr7ONEVFl1WEiJwKKxS1GhlDgXPGY49hh7EGpSukkd1xbgQTNBeIYWQLiwwCZRAZJ6vB1nh1kKfgjip8S6-c_Bv4sJcqdE7XVmpsGDDOKKCKai4KY3FeQAUFFfGDXfR6HLxOwX_1tu3kwfehid9LTBnNgUcacYsMWzr4tg22ul4FJM-k5UBanknLC-momg4qZ629KgpBCUBBfgHsCXcI</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2454716169</pqid></control><display><type>article</type><title>Size Constrained Clustering With MILP Formulation</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Tang, Wei ; Yang, Yang ; Zeng, Lanling ; Zhan, Yongzhao</creator><creatorcontrib>Tang, Wei ; Yang, Yang ; Zeng, Lanling ; Zhan, Yongzhao</creatorcontrib><description>Clustering is one of the essential tools for data mining since it reveals the natural structures of the unlabeled data. Many clustering algorithms have been proposed in the last decades. However, few of them are designed to adapt prior knowledge that is available in many real applications, such as the sizes of clusters. In this paper, we propose a novel iterative clustering algorithm that can impose the constraints on the sizes of clusters. Given an unordered set of cluster size constraints, the proposed method minimizes the mean squared error (MSE) while simultaneously considers the size constraints. Each iteration of the proposed method consists of two steps, namely an assignment step and an update step. In the assignment step, the observations are assigned into clusters under the size constraints. The assignment task is modeled as an integer linear programming (ILP) problem. We prove that part of the constraint matrix of this ILP problem is total unimodular. Therefore, the integer constraints on most of the variables can be omitted so that the problem would become a mixed integer programming (MILP) problem which is much easier to solve. In the update step, new cluster centroids will be updated as the centers of the observations in the corresponding clusters. Experiments on UCI data sets indicate that (1) imposing the size constraints as proposed could improve the clustering performance; (2) compared with the state-of-the-art size constrained clustering methods, the proposed method could efficiently derive better clustering results.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2019.2962191</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Centroids ; Clustering ; Clustering algorithms ; Clustering methods ; Constraint modelling ; Data mining ; Indexes ; Integer programming ; Iterative methods ; linear program ; Linear programming ; Matrix methods ; mean squared error ; Mixed integer ; Partitioning algorithms ; size constraints ; Task analysis</subject><ispartof>IEEE access, 2020, Vol.8, p.1587-1599</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-f339749e9a4ec0dd4166569e921b22c1acf0c0eb66d9594790520e8e0d01ad03</citedby><cites>FETCH-LOGICAL-c408t-f339749e9a4ec0dd4166569e921b22c1acf0c0eb66d9594790520e8e0d01ad03</cites><orcidid>0000-0002-7727-5649 ; 0000-0001-7475-2895 ; 0000-0001-8782-4819 ; 0000-0003-3414-2421</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8943118$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>315,781,785,865,2103,4025,27635,27925,27926,27927,54935</link.rule.ids></links><search><creatorcontrib>Tang, Wei</creatorcontrib><creatorcontrib>Yang, Yang</creatorcontrib><creatorcontrib>Zeng, Lanling</creatorcontrib><creatorcontrib>Zhan, Yongzhao</creatorcontrib><title>Size Constrained Clustering With MILP Formulation</title><title>IEEE access</title><addtitle>Access</addtitle><description>Clustering is one of the essential tools for data mining since it reveals the natural structures of the unlabeled data. Many clustering algorithms have been proposed in the last decades. However, few of them are designed to adapt prior knowledge that is available in many real applications, such as the sizes of clusters. In this paper, we propose a novel iterative clustering algorithm that can impose the constraints on the sizes of clusters. Given an unordered set of cluster size constraints, the proposed method minimizes the mean squared error (MSE) while simultaneously considers the size constraints. Each iteration of the proposed method consists of two steps, namely an assignment step and an update step. In the assignment step, the observations are assigned into clusters under the size constraints. The assignment task is modeled as an integer linear programming (ILP) problem. We prove that part of the constraint matrix of this ILP problem is total unimodular. Therefore, the integer constraints on most of the variables can be omitted so that the problem would become a mixed integer programming (MILP) problem which is much easier to solve. In the update step, new cluster centroids will be updated as the centers of the observations in the corresponding clusters. Experiments on UCI data sets indicate that (1) imposing the size constraints as proposed could improve the clustering performance; (2) compared with the state-of-the-art size constrained clustering methods, the proposed method could efficiently derive better clustering results.</description><subject>Algorithms</subject><subject>Centroids</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Clustering methods</subject><subject>Constraint modelling</subject><subject>Data mining</subject><subject>Indexes</subject><subject>Integer programming</subject><subject>Iterative methods</subject><subject>linear program</subject><subject>Linear programming</subject><subject>Matrix methods</subject><subject>mean squared error</subject><subject>Mixed integer</subject><subject>Partitioning algorithms</subject><subject>size constraints</subject><subject>Task analysis</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkE1Lw0AQhoMoWGp_QS8Bz6k7-5XssYRWCxWFFjwu291N3ZJm6yY56K93a0pxLvPBvO8MT5JMAc0AkHial-Vis5lhBGKGBccg4CYZYeAiI4zw23_1fTJp2wOKUcQRy0cJbNyPTUvftF1QrrEmLeu-7WxwzT79cN1n-rpav6dLH459rTrnm4fkrlJ1ayeXPE62y8W2fMnWb8-rcr7ONEVFl1WEiJwKKxS1GhlDgXPGY49hh7EGpSukkd1xbgQTNBeIYWQLiwwCZRAZJ6vB1nh1kKfgjip8S6-c_Bv4sJcqdE7XVmpsGDDOKKCKai4KY3FeQAUFFfGDXfR6HLxOwX_1tu3kwfehid9LTBnNgUcacYsMWzr4tg22ul4FJM-k5UBanknLC-momg4qZ629KgpBCUBBfgHsCXcI</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Tang, Wei</creator><creator>Yang, Yang</creator><creator>Zeng, Lanling</creator><creator>Zhan, Yongzhao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-7727-5649</orcidid><orcidid>https://orcid.org/0000-0001-7475-2895</orcidid><orcidid>https://orcid.org/0000-0001-8782-4819</orcidid><orcidid>https://orcid.org/0000-0003-3414-2421</orcidid></search><sort><creationdate>2020</creationdate><title>Size Constrained Clustering With MILP Formulation</title><author>Tang, Wei ; Yang, Yang ; Zeng, Lanling ; Zhan, Yongzhao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-f339749e9a4ec0dd4166569e921b22c1acf0c0eb66d9594790520e8e0d01ad03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Centroids</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Clustering methods</topic><topic>Constraint modelling</topic><topic>Data mining</topic><topic>Indexes</topic><topic>Integer programming</topic><topic>Iterative methods</topic><topic>linear program</topic><topic>Linear programming</topic><topic>Matrix methods</topic><topic>mean squared error</topic><topic>Mixed integer</topic><topic>Partitioning algorithms</topic><topic>size constraints</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tang, Wei</creatorcontrib><creatorcontrib>Yang, Yang</creatorcontrib><creatorcontrib>Zeng, Lanling</creatorcontrib><creatorcontrib>Zhan, Yongzhao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tang, Wei</au><au>Yang, Yang</au><au>Zeng, Lanling</au><au>Zhan, Yongzhao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Size Constrained Clustering With MILP Formulation</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2020</date><risdate>2020</risdate><volume>8</volume><spage>1587</spage><epage>1599</epage><pages>1587-1599</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Clustering is one of the essential tools for data mining since it reveals the natural structures of the unlabeled data. Many clustering algorithms have been proposed in the last decades. However, few of them are designed to adapt prior knowledge that is available in many real applications, such as the sizes of clusters. In this paper, we propose a novel iterative clustering algorithm that can impose the constraints on the sizes of clusters. Given an unordered set of cluster size constraints, the proposed method minimizes the mean squared error (MSE) while simultaneously considers the size constraints. Each iteration of the proposed method consists of two steps, namely an assignment step and an update step. In the assignment step, the observations are assigned into clusters under the size constraints. The assignment task is modeled as an integer linear programming (ILP) problem. We prove that part of the constraint matrix of this ILP problem is total unimodular. Therefore, the integer constraints on most of the variables can be omitted so that the problem would become a mixed integer programming (MILP) problem which is much easier to solve. In the update step, new cluster centroids will be updated as the centers of the observations in the corresponding clusters. Experiments on UCI data sets indicate that (1) imposing the size constraints as proposed could improve the clustering performance; (2) compared with the state-of-the-art size constrained clustering methods, the proposed method could efficiently derive better clustering results.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2019.2962191</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-7727-5649</orcidid><orcidid>https://orcid.org/0000-0001-7475-2895</orcidid><orcidid>https://orcid.org/0000-0001-8782-4819</orcidid><orcidid>https://orcid.org/0000-0003-3414-2421</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2020, Vol.8, p.1587-1599 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_crossref_primary_10_1109_ACCESS_2019_2962191 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Algorithms Centroids Clustering Clustering algorithms Clustering methods Constraint modelling Data mining Indexes Integer programming Iterative methods linear program Linear programming Matrix methods mean squared error Mixed integer Partitioning algorithms size constraints Task analysis |
title | Size Constrained Clustering With MILP Formulation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T08%3A00%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Size%20Constrained%20Clustering%20With%20MILP%20Formulation&rft.jtitle=IEEE%20access&rft.au=Tang,%20Wei&rft.date=2020&rft.volume=8&rft.spage=1587&rft.epage=1599&rft.pages=1587-1599&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2019.2962191&rft_dat=%3Cproquest_cross%3E2454716169%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2454716169&rft_id=info:pmid/&rft_ieee_id=8943118&rft_doaj_id=oai_doaj_org_article_c2d51565410f4c698de2781f184949eb&rfr_iscdi=true |