Improving the Representativeness of a Simple Random Sample: An Optimization Model and Its Application to the Continuous Sample of Working Lives

This paper proposes an optimization model for selecting a larger subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Mathematics (Basel) 2020-08, Vol.8 (8), p.1225
Hauptverfasser:	Núñez-Antón, Vicente, Pérez-Salamero González, Juan Manuel, Regúlez-Castillo, Marta, Vidal-Meliá, Carlos
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Chi-square test continuous sample of working lives Food science Goodness of fit Nonlinear programming Optimization Optimization models p-value Researchers Social security Statistical tests subsampling Variables
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper proposes an optimization model for selecting a larger subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex MINLP) and is, therefore, NP-hard. However, the solution is found by maximizing the size of the subsample taken from a stratified random sample with proportional allocation and restricting it to a p-value large enough to achieve a good fit to the population of interest using Pearson’s chi-square goodness-of-fit test. The paper also applies the model to the Continuous Sample of Working Lives (CSWL), which is a set of anonymized microdata containing information on individuals from Spanish Social Security records and the results prove that it is possible to obtain a larger subsample from the CSWL that (far) better represents the pensioner population for each of the waves analyzed.
ISSN:	2227-7390 2227-7390
DOI:	10.3390/math8081225