Estimating the size of hidden populations from register data

Prevalence estimates of drug use, or of its consequences, are considered important in many contexts and may have substantial influence over public policy. However, it is rarely possible to simply count the relevant individuals, in particular when the defining characteristics might be illegal, as in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BMC MEDICAL RESEARCH METHODOLOGY 2014-04, Vol.14 (1), p.58-58, Article 58
Hauptverfasser: Ledberg, Anders, Wennberg, Peter
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Prevalence estimates of drug use, or of its consequences, are considered important in many contexts and may have substantial influence over public policy. However, it is rarely possible to simply count the relevant individuals, in particular when the defining characteristics might be illegal, as in the drug use case. Consequently methods are needed to estimate the size of such partly 'hidden' populations, and many such methods have been developed and used within epidemiology including studies of alcohol and drug use. Here we introduce a method appropriate for estimating the size of human populations given a single source of data, for example entries in a health-care registry. The setup is the following: during a fixed time-period, e.g. a year, individuals belonging to the target population have a non-zero probability of being "registered". Each individual might be registered multiple times and the time-points of the registrations are recorded. Assuming that the population is closed and that the probability of being registered at least once is constant, we derive a family of maximum likelihood (ML) estimators of total population size. We study the ML estimator using Monte Carlo simulations and delimit the range of cases where it is useful. In particular we investigate the effect of making the population heterogeneous with respect to probability of being registered. The new estimator is asymptotically unbiased and we show that high precision estimates can be obtained for samples covering as little as 25% of the total population size. However, if the total population size is small (say in the order of 500) a larger fraction needs to be sampled to achieve reliable estimates. Further we show that the estimator give reliable estimates even when individuals differ in the probability of being registered. We also compare the ML estimator to an estimator known as Chao's estimator and show that the latter can have a substantial bias when applied to epidemiological data. The population size estimator suggested herein complements existing methods and is less sensitive to certain types of dependencies typical in epidemiological data.
ISSN:1471-2288
1471-2288
DOI:10.1186/1471-2288-14-58