Tree-Based Approach to Missing Data Imputation
Missing data is a well-recognized issue in data mining, and imputation is one way to handle the problem. In this paper, we propose a novel tree-based imputation algorithm called ¿imputation tree¿ (ITree). It first studies the predictability of missingness using all observations by constructing a bin...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Missing data is a well-recognized issue in data mining, and imputation is one way to handle the problem. In this paper, we propose a novel tree-based imputation algorithm called ¿imputation tree¿ (ITree). It first studies the predictability of missingness using all observations by constructing a binary classification tree called ¿missing pattern tree¿ (MPT). Then, missing values in each cluster or terminal node are estimated by a regression tree of observations at that node. We present empirical results using both synthetic and real data. Almost all experiments demonstrate that ITree is superior to other commonly used methods in estimating missing values. The algorithm not only produces an impressive accuracy, but also provides information on the nature of missingness. |
---|---|
ISSN: | 2375-9232 2375-9259 |
DOI: | 10.1109/ICDMW.2009.92 |