Imputing missing value through ensemble concept based on statistical measures

Many datasets include missing values in their attributes. Data mining techniques are not applicable in the presence of missing values. So an important step in preprocessing of a data mining task is missing value management. One of the most important categories in missing value management techniques...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge and information systems 2018-07, Vol.56 (1), p.123-139
Hauptverfasser: Jenghara, Moslem Mohammadi, Ebrahimpour-Komleh, Hossein, Rezaie, Vahideh, Nejatian, Samad, Parvin, Hamid, Yusof, Sharifah Kamilah Syed
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Many datasets include missing values in their attributes. Data mining techniques are not applicable in the presence of missing values. So an important step in preprocessing of a data mining task is missing value management. One of the most important categories in missing value management techniques is missing value imputation. This paper presents a new imputation technique. The proposed imputation technique is based on statistical measurements. The suggested imputation technique employs an ensemble of the estimators built to estimate the missing values based on positive and negative correlated observed attributes separately. Each estimator guesses a value for a missed value based on the average and variance of that feature. The average and variance of the feature are estimated from the non-missed values of that feature. The final consensus value for a missed value is the weighted aggregation of the values estimated by different estimators. The chief weight is attribute correlation, and the slight weight is dependent to kernel function such as kurtosis, skewness, number of involved samples and composition of them. The missing values are deliberately produced randomly at different levels. The experimentations indicate that the suggested technique has a good accuracy in comparison with the classical methods.
ISSN:0219-1377
0219-3116
DOI:10.1007/s10115-017-1118-1