Model-Robust Subdata Selection for Big Data

Subdata selection is necessary because of challenges arising from statistical analysis of big data using limited computing resources. The existing work on subdata selection relies heavily on a specified model, which calls for an approach that is robust to model misspecification. We propose the use o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of statistical theory and practice 2021-12, Vol.15 (4), Article 82
Hauptverfasser:	Shi, Chenlu, Tang, Boxin
Format:	Artikel
Sprache:	eng
Schlagworte:	Mathematics and Statistics Original Article Probability Theory and Stochastic Processes Special Issue: State of the art in research on design and analysis of experiments Statistical Theory and Methods Statistics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Subdata selection is necessary because of challenges arising from statistical analysis of big data using limited computing resources. The existing work on subdata selection relies heavily on a specified model, which calls for an approach that is robust to model misspecification. We propose the use of space-filling designs for subdata selection and examine a fast algorithm for its implementation. Our algorithm performs surprisingly well when compared to the reference distribution given by complete search. Simulations are conducted to compare our approach with a recently introduced IBOSS method, and the results show that our method is not just robust to model misspecification but also robust to model uncertainty. While robustness to model misspecification and uncertainty may be expected due to the nature of space-filling designs, we discover that our method enjoys an additional property of robustness when there exist substantial correlations among covariates.
ISSN:	1559-8608 1559-8616
DOI:	10.1007/s42519-021-00217-9