Evaluation of the Effect of Missing Data on the Estimation of the Analysis : A Simulation Example Using Epidemiological Survey Data
Missing data hinders epidemiological data analysis as it reduces the statistical power and produces biased estimates. The traditional methods for dealing with missing data, such as list-wise deletion (complete case analysis) and overall mean imputation, are known to produce biased estimations in som...
Gespeichert in:
Veröffentlicht in: | Japan society of veterinary epidemiology 2016/12/20, Vol.20(2), pp.111-117 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Missing data hinders epidemiological data analysis as it reduces the statistical power and produces biased estimates. The traditional methods for dealing with missing data, such as list-wise deletion (complete case analysis) and overall mean imputation, are known to produce biased estimations in some situations. To address these limitations, multiple imputation is becoming popular for handling missing data. In this study, a simulated data were analyzed to examine the influence of missing data on the estimates of analysis through comparing list-wise deletion and multiple imputation. For this purpose, an empirical epidemiological survey data concerning farm management practices in 563 dairy farms to investigate risk factors associated with bovine leukemia virus infection were used to create the simulated dataset with missing values.Missing data mechanisms are classified into 3 categories based on how the probability of missing values relates to the data : (1) missing completely at random (MCAR), the probability of being missing is a completely random event ; (2) missing at random (MAR), the probability of being missing depends only on the observed data ; and (3) not missing at random (NMAR), the probability of being missing depends on unobserved data or a variable which is missing itself. Five missing data scenarios with different missing data mechanisms and varied missing value proportions were examined in this study. For each scenario, 100 simulated datasets were generated from the empirical data. For each simulated dataset, list-wise deletion and multiple imputation were performed, and estimated coefficients regarding bovine leukemia virus infection via logistic regression were compared.Under any assumption of missing data mechanisms, estimates of coefficients obtained by list-wise deletion showed less precision than those obtained by multiple imputation. Under the MCAR assumption, list-wise deletion produced less precision in estimates as the proportion of missing data was larger, and under the MAR and NMAR assumptions it led to biased estimates. Meanwhile, multiple imputation produced less bias and greater precision under the MCAR and MAR assumptions. However, biased estimates were observed in the results of multiple imputation under the NMAR assumption. This study demonstrated that missing data induced less precision and biased estimates in analyzing epidemiological data and also showed the practical utility of multiple imputation methods to improve th |
---|---|
ISSN: | 1343-2583 1881-2562 |
DOI: | 10.2743/jve.20.111 |