Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model

•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2019-01, Vol.115, p.68-94
Hauptverfasser:	Sefidian, Amir Masoud, Daneshpour, Negin
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Classification Clustering Decision making Fuzzy c-means Grey relational analysis Knowledge discovery Missing data Missing data imputation Mutual information Quality Regression Regression models
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	94
container_issue
container_start_page	68
container_title	Expert systems with applications
container_volume	115
creator	Sefidian, Amir Masoud Daneshpour, Negin
description	•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods, in most cases.•The proposed method can also provide high classification accuracies. The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step. Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general.
doi_str_mv	10.1016/j.eswa.2018.07.057
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2131209906</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417418304822</els_id><sourcerecordid>2131209906</sourcerecordid><originalsourceid>FETCH-LOGICAL-c376t-51ad91affd8d2bc0a68b242eaeb129389b2888cfecf8d285fbe6c9cc8687c86c3</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEqXwA6wssW2C7TSxI7FBFS8JxAbWluOMK1d5FDsuanf8OQ7pms2MNDr3zsxF6JqSlBJa3G5S8N8qZYSKlPCU5PwEzajgWVLwMjtFM1LmPFlSvjxHF95vCKGcED5DP2_We9ut8U41AbBtt2FQg-07HP7GCnf9Dhq8drDHlfJQYxMOhz3WSQuq8wvchiGoBtvO9K6dpEcO1BAcYA8N6HG-wKqrsYPoFXdGru1raC7RmVGNh6tjn6PPx4eP1XPy-v70srp_TXTGiyHJqapLqoypRc0qTVQhKrZkoKCirMxEWTEhhDagTQREbioodKm1KASPRWdzdDP5bl3_FcAPctMH18WVktGMMlKWpIgUmyjteu8dGLl1tlVuLymRY9RyI8eo5Ri1JFzGqKPobhJBvH9nwUmvLXQaauvi67Lu7X_yX-tei0c</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2131209906</pqid></control><display><type>article</type><title>Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Sefidian, Amir Masoud ; Daneshpour, Negin</creator><creatorcontrib>Sefidian, Amir Masoud ; Daneshpour, Negin</creatorcontrib><description>•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods, in most cases.•The proposed method can also provide high classification accuracies. The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step. Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2018.07.057</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Algorithms ; Classification ; Clustering ; Decision making ; Fuzzy c-means ; Grey relational analysis ; Knowledge discovery ; Missing data ; Missing data imputation ; Mutual information ; Quality ; Regression ; Regression models</subject><ispartof>Expert systems with applications, 2019-01, Vol.115, p.68-94</ispartof><rights>2018 Elsevier Ltd</rights><rights>Copyright Elsevier BV Jan 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c376t-51ad91affd8d2bc0a68b242eaeb129389b2888cfecf8d285fbe6c9cc8687c86c3</citedby><cites>FETCH-LOGICAL-c376t-51ad91affd8d2bc0a68b242eaeb129389b2888cfecf8d285fbe6c9cc8687c86c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2018.07.057$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Sefidian, Amir Masoud</creatorcontrib><creatorcontrib>Daneshpour, Negin</creatorcontrib><title>Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model</title><title>Expert systems with applications</title><description>•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods, in most cases.•The proposed method can also provide high classification accuracies. The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step. Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general.</description><subject>Algorithms</subject><subject>Classification</subject><subject>Clustering</subject><subject>Decision making</subject><subject>Fuzzy c-means</subject><subject>Grey relational analysis</subject><subject>Knowledge discovery</subject><subject>Missing data</subject><subject>Missing data imputation</subject><subject>Mutual information</subject><subject>Quality</subject><subject>Regression</subject><subject>Regression models</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EEqXwA6wssW2C7TSxI7FBFS8JxAbWluOMK1d5FDsuanf8OQ7pms2MNDr3zsxF6JqSlBJa3G5S8N8qZYSKlPCU5PwEzajgWVLwMjtFM1LmPFlSvjxHF95vCKGcED5DP2_We9ut8U41AbBtt2FQg-07HP7GCnf9Dhq8drDHlfJQYxMOhz3WSQuq8wvchiGoBtvO9K6dpEcO1BAcYA8N6HG-wKqrsYPoFXdGru1raC7RmVGNh6tjn6PPx4eP1XPy-v70srp_TXTGiyHJqapLqoypRc0qTVQhKrZkoKCirMxEWTEhhDagTQREbioodKm1KASPRWdzdDP5bl3_FcAPctMH18WVktGMMlKWpIgUmyjteu8dGLl1tlVuLymRY9RyI8eo5Ri1JFzGqKPobhJBvH9nwUmvLXQaauvi67Lu7X_yX-tei0c</recordid><startdate>201901</startdate><enddate>201901</enddate><creator>Sefidian, Amir Masoud</creator><creator>Daneshpour, Negin</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201901</creationdate><title>Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model</title><author>Sefidian, Amir Masoud ; Daneshpour, Negin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c376t-51ad91affd8d2bc0a68b242eaeb129389b2888cfecf8d285fbe6c9cc8687c86c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Clustering</topic><topic>Decision making</topic><topic>Fuzzy c-means</topic><topic>Grey relational analysis</topic><topic>Knowledge discovery</topic><topic>Missing data</topic><topic>Missing data imputation</topic><topic>Mutual information</topic><topic>Quality</topic><topic>Regression</topic><topic>Regression models</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sefidian, Amir Masoud</creatorcontrib><creatorcontrib>Daneshpour, Negin</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sefidian, Amir Masoud</au><au>Daneshpour, Negin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model</atitle><jtitle>Expert systems with applications</jtitle><date>2019-01</date><risdate>2019</risdate><volume>115</volume><spage>68</spage><epage>94</epage><pages>68-94</pages><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods, in most cases.•The proposed method can also provide high classification accuracies. The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step. Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2018.07.057</doi><tpages>27</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0957-4174
ispartof	Expert systems with applications, 2019-01, Vol.115, p.68-94
issn	0957-4174 1873-6793
language	eng
recordid	cdi_proquest_journals_2131209906
source	Elsevier ScienceDirect Journals Complete
subjects	Algorithms Classification Clustering Decision making Fuzzy c-means Grey relational analysis Knowledge discovery Missing data Missing data imputation Mutual information Quality Regression Regression models
title	Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T14%3A13%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Missing%20value%20imputation%20using%20a%20novel%20grey%20based%20fuzzy%20c-means,%20mutual%20information%20based%20feature%20selection,%20and%20regression%20model&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Sefidian,%20Amir%20Masoud&rft.date=2019-01&rft.volume=115&rft.spage=68&rft.epage=94&rft.pages=68-94&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2018.07.057&rft_dat=%3Cproquest_cross%3E2131209906%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2131209906&rft_id=info:pmid/&rft_els_id=S0957417418304822&rfr_iscdi=true