The data quality concept of accuracy in the context of publicly shared data sets

Along with other data quality dimensions, the concept of accuracy is often used to describe the quality of a particular data set. However, its basic definition refers to the statistical properties of estimators, which can hardly be proved by means of just a single survey. This ambiguity can be resol...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Wirtschafts- und Sozialstatistisches Archiv 2009-06, Vol.3 (1), p.67-80
Hauptverfasser: Kuchler, Carsten, Spiess, Martin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Along with other data quality dimensions, the concept of accuracy is often used to describe the quality of a particular data set. However, its basic definition refers to the statistical properties of estimators, which can hardly be proved by means of just a single survey. This ambiguity can be resolved by assigning “accuracy” to survey processes that are known to affect these properties. In this contribution, we consider the sub-process of imputation as one important step in setting up a data set and argue that criteria like the so called “hit-rate” criterion, which is intended to measure the accuracy of a data set by some distance function of “true” but unobserved and imputed values, is neither required nor desirable. In contrast, the so-called “inference” criterion allows statements on the validity of inferences based on a suitably completed data set under rather general conditions. The underlying theoretical concepts are illustrated by means of a simulation study. It is emphasised that the same arguments apply to other survey processes that introduce uncertainty into an edited data set.
ISSN:1863-8155
1863-8163
DOI:10.1007/s11943-009-0056-0