Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model
•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods,...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2019-01, Vol.115, p.68-94 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 94 |
---|---|
container_issue | |
container_start_page | 68 |
container_title | Expert systems with applications |
container_volume | 115 |
creator | Sefidian, Amir Masoud Daneshpour, Negin |
description | •A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods, in most cases.•The proposed method can also provide high classification accuracies.
The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step.
Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general. |
doi_str_mv | 10.1016/j.eswa.2018.07.057 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2131209906</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417418304822</els_id><sourcerecordid>2131209906</sourcerecordid><originalsourceid>FETCH-LOGICAL-c376t-51ad91affd8d2bc0a68b242eaeb129389b2888cfecf8d285fbe6c9cc8687c86c3</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEqXwA6wssW2C7TSxI7FBFS8JxAbWluOMK1d5FDsuanf8OQ7pms2MNDr3zsxF6JqSlBJa3G5S8N8qZYSKlPCU5PwEzajgWVLwMjtFM1LmPFlSvjxHF95vCKGcED5DP2_We9ut8U41AbBtt2FQg-07HP7GCnf9Dhq8drDHlfJQYxMOhz3WSQuq8wvchiGoBtvO9K6dpEcO1BAcYA8N6HG-wKqrsYPoFXdGru1raC7RmVGNh6tjn6PPx4eP1XPy-v70srp_TXTGiyHJqapLqoypRc0qTVQhKrZkoKCirMxEWTEhhDagTQREbioodKm1KASPRWdzdDP5bl3_FcAPctMH18WVktGMMlKWpIgUmyjteu8dGLl1tlVuLymRY9RyI8eo5Ri1JFzGqKPobhJBvH9nwUmvLXQaauvi67Lu7X_yX-tei0c</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2131209906</pqid></control><display><type>article</type><title>Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Sefidian, Amir Masoud ; Daneshpour, Negin</creator><creatorcontrib>Sefidian, Amir Masoud ; Daneshpour, Negin</creatorcontrib><description>•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods, in most cases.•The proposed method can also provide high classification accuracies.
The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step.
Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2018.07.057</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Algorithms ; Classification ; Clustering ; Decision making ; Fuzzy c-means ; Grey relational analysis ; Knowledge discovery ; Missing data ; Missing data imputation ; Mutual information ; Quality ; Regression ; Regression models</subject><ispartof>Expert systems with applications, 2019-01, Vol.115, p.68-94</ispartof><rights>2018 Elsevier Ltd</rights><rights>Copyright Elsevier BV Jan 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c376t-51ad91affd8d2bc0a68b242eaeb129389b2888cfecf8d285fbe6c9cc8687c86c3</citedby><cites>FETCH-LOGICAL-c376t-51ad91affd8d2bc0a68b242eaeb129389b2888cfecf8d285fbe6c9cc8687c86c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2018.07.057$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Sefidian, Amir Masoud</creatorcontrib><creatorcontrib>Daneshpour, Negin</creatorcontrib><title>Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model</title><title>Expert systems with applications</title><description>•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods, in most cases.•The proposed method can also provide high classification accuracies.
The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step.
Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general.</description><subject>Algorithms</subject><subject>Classification</subject><subject>Clustering</subject><subject>Decision making</subject><subject>Fuzzy c-means</subject><subject>Grey relational analysis</subject><subject>Knowledge discovery</subject><subject>Missing data</subject><subject>Missing data imputation</subject><subject>Mutual information</subject><subject>Quality</subject><subject>Regression</subject><subject>Regression models</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EEqXwA6wssW2C7TSxI7FBFS8JxAbWluOMK1d5FDsuanf8OQ7pms2MNDr3zsxF6JqSlBJa3G5S8N8qZYSKlPCU5PwEzajgWVLwMjtFM1LmPFlSvjxHF95vCKGcED5DP2_We9ut8U41AbBtt2FQg-07HP7GCnf9Dhq8drDHlfJQYxMOhz3WSQuq8wvchiGoBtvO9K6dpEcO1BAcYA8N6HG-wKqrsYPoFXdGru1raC7RmVGNh6tjn6PPx4eP1XPy-v70srp_TXTGiyHJqapLqoypRc0qTVQhKrZkoKCirMxEWTEhhDagTQREbioodKm1KASPRWdzdDP5bl3_FcAPctMH18WVktGMMlKWpIgUmyjteu8dGLl1tlVuLymRY9RyI8eo5Ri1JFzGqKPobhJBvH9nwUmvLXQaauvi67Lu7X_yX-tei0c</recordid><startdate>201901</startdate><enddate>201901</enddate><creator>Sefidian, Amir Masoud</creator><creator>Daneshpour, Negin</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201901</creationdate><title>Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model</title><author>Sefidian, Amir Masoud ; Daneshpour, Negin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c376t-51ad91affd8d2bc0a68b242eaeb129389b2888cfecf8d285fbe6c9cc8687c86c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Clustering</topic><topic>Decision making</topic><topic>Fuzzy c-means</topic><topic>Grey relational analysis</topic><topic>Knowledge discovery</topic><topic>Missing data</topic><topic>Missing data imputation</topic><topic>Mutual information</topic><topic>Quality</topic><topic>Regression</topic><topic>Regression models</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sefidian, Amir Masoud</creatorcontrib><creatorcontrib>Daneshpour, Negin</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sefidian, Amir Masoud</au><au>Daneshpour, Negin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model</atitle><jtitle>Expert systems with applications</jtitle><date>2019-01</date><risdate>2019</risdate><volume>115</volume><spage>68</spage><epage>94</epage><pages>68-94</pages><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•A new hybrid method for the imputation of missing values is proposed.•The method is based on a novel fuzzy c-means, mutual information, and regression.•Performance of imputation increases by using Grey in the fuzzy c-means algorithm.•The proposed method outperforms five existing imputation methods, in most cases.•The proposed method can also provide high classification accuracies.
The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step.
Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2018.07.057</doi><tpages>27</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0957-4174 |
ispartof | Expert systems with applications, 2019-01, Vol.115, p.68-94 |
issn | 0957-4174 1873-6793 |
language | eng |
recordid | cdi_proquest_journals_2131209906 |
source | Elsevier ScienceDirect Journals Complete |
subjects | Algorithms Classification Clustering Decision making Fuzzy c-means Grey relational analysis Knowledge discovery Missing data Missing data imputation Mutual information Quality Regression Regression models |
title | Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T14%3A13%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Missing%20value%20imputation%20using%20a%20novel%20grey%20based%20fuzzy%20c-means,%20mutual%20information%20based%20feature%20selection,%20and%20regression%20model&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Sefidian,%20Amir%20Masoud&rft.date=2019-01&rft.volume=115&rft.spage=68&rft.epage=94&rft.pages=68-94&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2018.07.057&rft_dat=%3Cproquest_cross%3E2131209906%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2131209906&rft_id=info:pmid/&rft_els_id=S0957417418304822&rfr_iscdi=true |