Exploring the difficulty of estimating win probability: a simulation study
Estimating win probability is one of the classic modeling tasks of sports analytics. Many widely used win probability estimators use machine learning to fit the relationship between a binary win/loss outcome variable and certain game-state variables. To illustrate just how difficult it is to accurat...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Estimating win probability is one of the classic modeling tasks of sports
analytics. Many widely used win probability estimators use machine learning to
fit the relationship between a binary win/loss outcome variable and certain
game-state variables. To illustrate just how difficult it is to accurately fit
such a model from noisy and highly correlated observational data, in this paper
we conduct a simulation study. We create a simplified random walk version of
football in which true win probability at each game-state is known, and we see
how well a model recovers it. We find that the dependence structure of
observational play-by-play data substantially inflates the bias and variance of
estimators and lowers the effective sample size. This makes it essential to
quantify uncertainty in win probability estimates, but typical bootstrapped
confidence intervals are too narrow and don't achieve nominal coverage. Hence,
we introduce a novel method, the fractional bootstrap, to calibrate these
intervals to achieve adequate coverage. Our findings are not unique to the
particular application of estimating win probability; they are broadly
applicable across sports analytics, as myriad other sports datasets are
clustered into groups of observations that share the same outcome. |
---|---|
DOI: | 10.48550/arxiv.2406.16171 |