Pretest estimation in combining probability and non-probability samples
Multiple heterogeneous data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we develop a unified framework of the test-and-pool approach to general parameter estimation by combining gold-standard pro...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multiple heterogeneous data sources are becoming increasingly available for
statistical analyses in the era of big data. As an important example in
finite-population inference, we develop a unified framework of the
test-and-pool approach to general parameter estimation by combining
gold-standard probability and non-probability samples. We focus on the case
when the study variable is observed in both datasets for estimating the target
parameters, and each contains other auxiliary variables. Utilizing the
probability design, we conduct a pretest procedure to determine the
comparability of the non-probability data with the probability data and decide
whether or not to leverage the non-probability data in a pooled analysis. When
the probability and non-probability data are comparable, our approach combines
both data for efficient estimation. Otherwise, we retain only the probability
data for estimation. We also characterize the asymptotic distribution of the
proposed test-and-pool estimator under a local alternative and provide a
data-adaptive procedure to select the critical tuning parameters that target
the smallest mean square error of the test-and-pool estimator. Lastly, to deal
with the non-regularity of the test-and-pool estimator, we construct a robust
confidence interval that has a good finite-sample coverage property. |
---|---|
DOI: | 10.48550/arxiv.2305.17801 |