Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model

•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2019-01, Vol.115, p.68-94
Hauptverfasser: Sefidian, Amir Masoud, Daneshpour, Negin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 94
container_issue
container_start_page 68
container_title Expert systems with applications
container_volume 115
creator Sefidian, Amir Masoud
Daneshpour, Negin
description •A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods, in most cases.•The proposed method can also provide high classification accuracies. The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step. Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general.
doi_str_mv 10.1016/j.eswa.2018.07.057
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2131209906</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417418304822</els_id><sourcerecordid>2131209906</sourcerecordid><originalsourceid>FETCH-LOGICAL-c376t-51ad91affd8d2bc0a68b242eaeb129389b2888cfecf8d285fbe6c9cc8687c86c3</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEqXwA6wssW2C7TSxI7FBFS8JxAbWluOMK1d5FDsuanf8OQ7pms2MNDr3zsxF6JqSlBJa3G5S8N8qZYSKlPCU5PwEzajgWVLwMjtFM1LmPFlSvjxHF95vCKGcED5DP2_We9ut8U41AbBtt2FQg-07HP7GCnf9Dhq8drDHlfJQYxMOhz3WSQuq8wvchiGoBtvO9K6dpEcO1BAcYA8N6HG-wKqrsYPoFXdGru1raC7RmVGNh6tjn6PPx4eP1XPy-v70srp_TXTGiyHJqapLqoypRc0qTVQhKrZkoKCirMxEWTEhhDagTQREbioodKm1KASPRWdzdDP5bl3_FcAPctMH18WVktGMMlKWpIgUmyjteu8dGLl1tlVuLymRY9RyI8eo5Ri1JFzGqKPobhJBvH9nwUmvLXQaauvi67Lu7X_yX-tei0c</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2131209906</pqid></control><display><type>article</type><title>Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Sefidian, Amir Masoud ; Daneshpour, Negin</creator><creatorcontrib>Sefidian, Amir Masoud ; Daneshpour, Negin</creatorcontrib><description>•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods, in most cases.•The proposed method can also provide high classification accuracies. The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step. Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2018.07.057</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Algorithms ; Classification ; Clustering ; Decision making ; Fuzzy c-means ; Grey relational analysis ; Knowledge discovery ; Missing data ; Missing data imputation ; Mutual information ; Quality ; Regression ; Regression models</subject><ispartof>Expert systems with applications, 2019-01, Vol.115, p.68-94</ispartof><rights>2018 Elsevier Ltd</rights><rights>Copyright Elsevier BV Jan 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c376t-51ad91affd8d2bc0a68b242eaeb129389b2888cfecf8d285fbe6c9cc8687c86c3</citedby><cites>FETCH-LOGICAL-c376t-51ad91affd8d2bc0a68b242eaeb129389b2888cfecf8d285fbe6c9cc8687c86c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2018.07.057$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Sefidian, Amir Masoud</creatorcontrib><creatorcontrib>Daneshpour, Negin</creatorcontrib><title>Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model</title><title>Expert systems with applications</title><description>•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods, in most cases.•The proposed method can also provide high classification accuracies. The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step. Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general.</description><subject>Algorithms</subject><subject>Classification</subject><subject>Clustering</subject><subject>Decision making</subject><subject>Fuzzy c-means</subject><subject>Grey relational analysis</subject><subject>Knowledge discovery</subject><subject>Missing data</subject><subject>Missing data imputation</subject><subject>Mutual information</subject><subject>Quality</subject><subject>Regression</subject><subject>Regression models</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EEqXwA6wssW2C7TSxI7FBFS8JxAbWluOMK1d5FDsuanf8OQ7pms2MNDr3zsxF6JqSlBJa3G5S8N8qZYSKlPCU5PwEzajgWVLwMjtFM1LmPFlSvjxHF95vCKGcED5DP2_We9ut8U41AbBtt2FQg-07HP7GCnf9Dhq8drDHlfJQYxMOhz3WSQuq8wvchiGoBtvO9K6dpEcO1BAcYA8N6HG-wKqrsYPoFXdGru1raC7RmVGNh6tjn6PPx4eP1XPy-v70srp_TXTGiyHJqapLqoypRc0qTVQhKrZkoKCirMxEWTEhhDagTQREbioodKm1KASPRWdzdDP5bl3_FcAPctMH18WVktGMMlKWpIgUmyjteu8dGLl1tlVuLymRY9RyI8eo5Ri1JFzGqKPobhJBvH9nwUmvLXQaauvi67Lu7X_yX-tei0c</recordid><startdate>201901</startdate><enddate>201901</enddate><creator>Sefidian, Amir Masoud</creator><creator>Daneshpour, Negin</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201901</creationdate><title>Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model</title><author>Sefidian, Amir Masoud ; Daneshpour, Negin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c376t-51ad91affd8d2bc0a68b242eaeb129389b2888cfecf8d285fbe6c9cc8687c86c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Clustering</topic><topic>Decision making</topic><topic>Fuzzy c-means</topic><topic>Grey relational analysis</topic><topic>Knowledge discovery</topic><topic>Missing data</topic><topic>Missing data imputation</topic><topic>Mutual information</topic><topic>Quality</topic><topic>Regression</topic><topic>Regression models</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sefidian, Amir Masoud</creatorcontrib><creatorcontrib>Daneshpour, Negin</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sefidian, Amir Masoud</au><au>Daneshpour, Negin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model</atitle><jtitle>Expert systems with applications</jtitle><date>2019-01</date><risdate>2019</risdate><volume>115</volume><spage>68</spage><epage>94</epage><pages>68-94</pages><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods, in most cases.•The proposed method can also provide high classification accuracies. The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step. Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2018.07.057</doi><tpages>27</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0957-4174
ispartof Expert systems with applications, 2019-01, Vol.115, p.68-94
issn 0957-4174
1873-6793
language eng
recordid cdi_proquest_journals_2131209906
source Elsevier ScienceDirect Journals Complete
subjects Algorithms
Classification
Clustering
Decision making
Fuzzy c-means
Grey relational analysis
Knowledge discovery
Missing data
Missing data imputation
Mutual information
Quality
Regression
Regression models
title Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T14%3A13%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Missing%20value%20imputation%20using%20a%20novel%20grey%20based%20fuzzy%20c-means,%20mutual%20information%20based%20feature%20selection,%20and%20regression%20model&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Sefidian,%20Amir%20Masoud&rft.date=2019-01&rft.volume=115&rft.spage=68&rft.epage=94&rft.pages=68-94&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2018.07.057&rft_dat=%3Cproquest_cross%3E2131209906%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2131209906&rft_id=info:pmid/&rft_els_id=S0957417418304822&rfr_iscdi=true