Online cross‐validation‐based ensemble learning
Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble‐based online estimators of an infinite‐dimensional target parameter, such as a regressi...
Gespeichert in:
Veröffentlicht in: | Statistics in medicine 2018-01, Vol.37 (2), p.249-260 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble‐based online estimators of an infinite‐dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time‐series models and, as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross‐validation to identify the algorithm with the best performance. We show that by basing estimates on the cross‐validation‐selected algorithm, we are asymptotically guaranteed to perform as well as the true, unknown best‐performing algorithm. We provide extensions of this approach including online estimation of the optimal ensemble of candidate online estimators. We illustrate excellent performance of our methods using simulations and a real data example where we make streaming predictions of infectious disease incidence using data from a large database. Copyright © 2017 John Wiley & Sons, Ltd. |
---|---|
ISSN: | 0277-6715 1097-0258 |
DOI: | 10.1002/sim.7320 |