Optimal subsampling for generalized additive models on large-scale datasets
In the age of big data, the efficient analysis of vast datasets is paramount, yet hindered by computational limitations such as memory constraints and processing duration. To tackle these obstacles, conventional approaches often resort to parallel and distributed computing methodologies. In this stu...
Gespeichert in:
Veröffentlicht in: | Statistics and computing 2025, Vol.35 (1) |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In the age of big data, the efficient analysis of vast datasets is paramount, yet hindered by computational limitations such as memory constraints and processing duration. To tackle these obstacles, conventional approaches often resort to parallel and distributed computing methodologies. In this study, we present an innovative statistical approach that exploits an optimized subsampling technique tailored for generalized additive models (GAM). Our approach harnesses the versatile modeling capabilities of GAM while alleviating computational burdens and enhancing the precision of parameter estimation. Through simulations and a practical application, we illustrate the efficacy of our method. Furthermore, we provide theoretical support by establishing convergence assurances and elucidating the asymptotic properties of our estimators. Our findings indicate that our approach surpasses uniform sampling in accuracy, while significantly reducing computational time in comparison to utilizing complete large-scale datasets. |
---|---|
ISSN: | 0960-3174 1573-1375 |
DOI: | 10.1007/s11222-024-10546-x |