Tree-Based Approach to Missing Data Imputation

Missing data is a well-recognized issue in data mining, and imputation is one way to handle the problem. In this paper, we propose a novel tree-based imputation algorithm called ¿imputation tree¿ (ITree). It first studies the predictability of missingness using all observations by constructing a bin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Vateekul, P., Sarinnapakorn, K.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Missing data is a well-recognized issue in data mining, and imputation is one way to handle the problem. In this paper, we propose a novel tree-based imputation algorithm called ¿imputation tree¿ (ITree). It first studies the predictability of missingness using all observations by constructing a binary classification tree called ¿missing pattern tree¿ (MPT). Then, missing values in each cluster or terminal node are estimated by a regression tree of observations at that node. We present empirical results using both synthetic and real data. Almost all experiments demonstrate that ITree is superior to other commonly used methods in estimating missing values. The algorithm not only produces an impressive accuracy, but also provides information on the nature of missingness.
ISSN:2375-9232
2375-9259
DOI:10.1109/ICDMW.2009.92