Improving the Representativeness of a Simple Random Sample: An Optimization Model and Its Application to the Continuous Sample of Working Lives

This paper proposes an optimization model for selecting a larger subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Mathematics (Basel) 2020-08, Vol.8 (8), p.1225
Hauptverfasser: Núñez-Antón, Vicente, Pérez-Salamero González, Juan Manuel, Regúlez-Castillo, Marta, Vidal-Meliá, Carlos
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper proposes an optimization model for selecting a larger subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex MINLP) and is, therefore, NP-hard. However, the solution is found by maximizing the size of the subsample taken from a stratified random sample with proportional allocation and restricting it to a p-value large enough to achieve a good fit to the population of interest using Pearson’s chi-square goodness-of-fit test. The paper also applies the model to the Continuous Sample of Working Lives (CSWL), which is a set of anonymized microdata containing information on individuals from Spanish Social Security records and the results prove that it is possible to obtain a larger subsample from the CSWL that (far) better represents the pensioner population for each of the waves analyzed.
ISSN:2227-7390
2227-7390
DOI:10.3390/math8081225