Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values

Missing values are a common phenomenon when dealing with real-world data sets. Analysis of incomplete data sets has become an active area of research. In this paper, we focus on the problem of clustering incomplete data, which is intended to introduce some prior distribution information of the missi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2016-05, Vol.99, p.51-70
Hauptverfasser:	Zhang, Liyong, Lu, Wei, Liu, Xiaodong, Pedrycz, Witold, Zhong, Chongquan
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Alternating optimization Clustering Fuzzy Fuzzy clustering Granular materials Granules Incomplete data Missing value Probabilistic information granules Probabilistic methods Probability theory
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	70
container_issue
container_start_page	51
container_title	Knowledge-based systems
container_volume	99
creator	Zhang, Liyong Lu, Wei Liu, Xiaodong Pedrycz, Witold Zhong, Chongquan
description	Missing values are a common phenomenon when dealing with real-world data sets. Analysis of incomplete data sets has become an active area of research. In this paper, we focus on the problem of clustering incomplete data, which is intended to introduce some prior distribution information of the missing values into the algorithm of fuzzy clustering. First, non-parametric hypothesis testing is employed to describe the missing values adhering to a certain Gaussian distribution as probabilistic information granules based on the nearest neighbors of incomplete data. Second, we propose a novel clustering model, in which probabilistic information granules of missing values are incorporated into the Fuzzy C-Means clustering of incomplete data by involving the maximum likelihood criterion. Third, the clustering model is optimized by using a tri-level alternating optimization utilizing the method of Lagrange multipliers. The convergence and the time complexity of the clustering algorithm are also discussed. The experiments reported both on synthetic and real-world data sets demonstrate that the proposed approach can effectively realize clustering of incomplete data.
doi_str_mv	10.1016/j.knosys.2016.01.048
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1816039018</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705116000782</els_id><sourcerecordid>1816039018</sourcerecordid><originalsourceid>FETCH-LOGICAL-c545t-45f3db02db6e8d7e546a8ce9e69afcf629e29c285ab2a14e22152eeea45774953</originalsourceid><addsrcrecordid>eNp9UMtOwzAQtBBIlMcfcMiRS4Lt2nlckFBFAamIC5wtx9lULklcvE6l9utxFM6cVqudmZ0ZQu4YzRhl-cMu-x4cHjHjccsoy6goz8iClQVPC0Grc7KglaRpQSW7JFeIO0op56xcELseT6djskrfQQ-YmG7EAN4O28S1iR2M6_cdBEgaHXRSa4QmcUOy967Wte0sBmsirHW-18HGy9brYewAJ3pvESelg-5GwBty0eoO4fZvXpOv9fPn6jXdfLy8rZ42qZFChlTIdtnUlDd1DmVTgBS5Lg1UkFe6NW3OK-CV4aXUNddMQIwhOQBoIYtCVHJ5Te5n3WjyJ_4NKvow0HV6ADeiYiXL6bKirIxQMUONd4geWrX3ttf-qBhVU7Nqp-Zm1dSsokzFZiPtcaZBjHGw4BUaC4OBxnowQTXO_i_wC3eXhmw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1816039018</pqid></control><display><type>article</type><title>Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Zhang, Liyong ; Lu, Wei ; Liu, Xiaodong ; Pedrycz, Witold ; Zhong, Chongquan</creator><creatorcontrib>Zhang, Liyong ; Lu, Wei ; Liu, Xiaodong ; Pedrycz, Witold ; Zhong, Chongquan</creatorcontrib><description>Missing values are a common phenomenon when dealing with real-world data sets. Analysis of incomplete data sets has become an active area of research. In this paper, we focus on the problem of clustering incomplete data, which is intended to introduce some prior distribution information of the missing values into the algorithm of fuzzy clustering. First, non-parametric hypothesis testing is employed to describe the missing values adhering to a certain Gaussian distribution as probabilistic information granules based on the nearest neighbors of incomplete data. Second, we propose a novel clustering model, in which probabilistic information granules of missing values are incorporated into the Fuzzy C-Means clustering of incomplete data by involving the maximum likelihood criterion. Third, the clustering model is optimized by using a tri-level alternating optimization utilizing the method of Lagrange multipliers. The convergence and the time complexity of the clustering algorithm are also discussed. The experiments reported both on synthetic and real-world data sets demonstrate that the proposed approach can effectively realize clustering of incomplete data.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2016.01.048</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Algorithms ; Alternating optimization ; Clustering ; Fuzzy ; Fuzzy clustering ; Granular materials ; Granules ; Incomplete data ; Missing value ; Probabilistic information granules ; Probabilistic methods ; Probability theory</subject><ispartof>Knowledge-based systems, 2016-05, Vol.99, p.51-70</ispartof><rights>2016 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c545t-45f3db02db6e8d7e546a8ce9e69afcf629e29c285ab2a14e22152eeea45774953</citedby><cites>FETCH-LOGICAL-c545t-45f3db02db6e8d7e546a8ce9e69afcf629e29c285ab2a14e22152eeea45774953</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705116000782$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27903,27904,65309</link.rule.ids></links><search><creatorcontrib>Zhang, Liyong</creatorcontrib><creatorcontrib>Lu, Wei</creatorcontrib><creatorcontrib>Liu, Xiaodong</creatorcontrib><creatorcontrib>Pedrycz, Witold</creatorcontrib><creatorcontrib>Zhong, Chongquan</creatorcontrib><title>Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values</title><title>Knowledge-based systems</title><description>Missing values are a common phenomenon when dealing with real-world data sets. Analysis of incomplete data sets has become an active area of research. In this paper, we focus on the problem of clustering incomplete data, which is intended to introduce some prior distribution information of the missing values into the algorithm of fuzzy clustering. First, non-parametric hypothesis testing is employed to describe the missing values adhering to a certain Gaussian distribution as probabilistic information granules based on the nearest neighbors of incomplete data. Second, we propose a novel clustering model, in which probabilistic information granules of missing values are incorporated into the Fuzzy C-Means clustering of incomplete data by involving the maximum likelihood criterion. Third, the clustering model is optimized by using a tri-level alternating optimization utilizing the method of Lagrange multipliers. The convergence and the time complexity of the clustering algorithm are also discussed. The experiments reported both on synthetic and real-world data sets demonstrate that the proposed approach can effectively realize clustering of incomplete data.</description><subject>Algorithms</subject><subject>Alternating optimization</subject><subject>Clustering</subject><subject>Fuzzy</subject><subject>Fuzzy clustering</subject><subject>Granular materials</subject><subject>Granules</subject><subject>Incomplete data</subject><subject>Missing value</subject><subject>Probabilistic information granules</subject><subject>Probabilistic methods</subject><subject>Probability theory</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9UMtOwzAQtBBIlMcfcMiRS4Lt2nlckFBFAamIC5wtx9lULklcvE6l9utxFM6cVqudmZ0ZQu4YzRhl-cMu-x4cHjHjccsoy6goz8iClQVPC0Grc7KglaRpQSW7JFeIO0op56xcELseT6djskrfQQ-YmG7EAN4O28S1iR2M6_cdBEgaHXRSa4QmcUOy967Wte0sBmsirHW-18HGy9brYewAJ3pvESelg-5GwBty0eoO4fZvXpOv9fPn6jXdfLy8rZ42qZFChlTIdtnUlDd1DmVTgBS5Lg1UkFe6NW3OK-CV4aXUNddMQIwhOQBoIYtCVHJ5Te5n3WjyJ_4NKvow0HV6ADeiYiXL6bKirIxQMUONd4geWrX3ttf-qBhVU7Nqp-Zm1dSsokzFZiPtcaZBjHGw4BUaC4OBxnowQTXO_i_wC3eXhmw</recordid><startdate>20160501</startdate><enddate>20160501</enddate><creator>Zhang, Liyong</creator><creator>Lu, Wei</creator><creator>Liu, Xiaodong</creator><creator>Pedrycz, Witold</creator><creator>Zhong, Chongquan</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20160501</creationdate><title>Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values</title><author>Zhang, Liyong ; Lu, Wei ; Liu, Xiaodong ; Pedrycz, Witold ; Zhong, Chongquan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c545t-45f3db02db6e8d7e546a8ce9e69afcf629e29c285ab2a14e22152eeea45774953</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Alternating optimization</topic><topic>Clustering</topic><topic>Fuzzy</topic><topic>Fuzzy clustering</topic><topic>Granular materials</topic><topic>Granules</topic><topic>Incomplete data</topic><topic>Missing value</topic><topic>Probabilistic information granules</topic><topic>Probabilistic methods</topic><topic>Probability theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Liyong</creatorcontrib><creatorcontrib>Lu, Wei</creatorcontrib><creatorcontrib>Liu, Xiaodong</creatorcontrib><creatorcontrib>Pedrycz, Witold</creatorcontrib><creatorcontrib>Zhong, Chongquan</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Liyong</au><au>Lu, Wei</au><au>Liu, Xiaodong</au><au>Pedrycz, Witold</au><au>Zhong, Chongquan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values</atitle><jtitle>Knowledge-based systems</jtitle><date>2016-05-01</date><risdate>2016</risdate><volume>99</volume><spage>51</spage><epage>70</epage><pages>51-70</pages><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Missing values are a common phenomenon when dealing with real-world data sets. Analysis of incomplete data sets has become an active area of research. In this paper, we focus on the problem of clustering incomplete data, which is intended to introduce some prior distribution information of the missing values into the algorithm of fuzzy clustering. First, non-parametric hypothesis testing is employed to describe the missing values adhering to a certain Gaussian distribution as probabilistic information granules based on the nearest neighbors of incomplete data. Second, we propose a novel clustering model, in which probabilistic information granules of missing values are incorporated into the Fuzzy C-Means clustering of incomplete data by involving the maximum likelihood criterion. Third, the clustering model is optimized by using a tri-level alternating optimization utilizing the method of Lagrange multipliers. The convergence and the time complexity of the clustering algorithm are also discussed. The experiments reported both on synthetic and real-world data sets demonstrate that the proposed approach can effectively realize clustering of incomplete data.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2016.01.048</doi><tpages>20</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0950-7051
ispartof	Knowledge-based systems, 2016-05, Vol.99, p.51-70
issn	0950-7051 1872-7409
language	eng
recordid	cdi_proquest_miscellaneous_1816039018
source	Elsevier ScienceDirect Journals Complete
subjects	Algorithms Alternating optimization Clustering Fuzzy Fuzzy clustering Granular materials Granules Incomplete data Missing value Probabilistic information granules Probabilistic methods Probability theory
title	Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T20%3A16%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fuzzy%20C-Means%20clustering%20of%20incomplete%20data%20based%20on%20probabilistic%20information%20granules%20of%20missing%20values&rft.jtitle=Knowledge-based%20systems&rft.au=Zhang,%20Liyong&rft.date=2016-05-01&rft.volume=99&rft.spage=51&rft.epage=70&rft.pages=51-70&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2016.01.048&rft_dat=%3Cproquest_cross%3E1816039018%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1816039018&rft_id=info:pmid/&rft_els_id=S0950705116000782&rfr_iscdi=true