Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive {\em uncertainty}. Quantifying uncertainty is especially critical in real-world settings,...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Modern machine learning methods including deep learning have achieved great
success in predictive accuracy for supervised learning tasks, but may still
fall short in giving useful estimates of their predictive {\em uncertainty}.
Quantifying uncertainty is especially critical in real-world settings, which
often involve input distributions that are shifted from the training
distribution due to a variety of factors including sample bias and
non-stationarity. In such settings, well calibrated uncertainty estimates
convey information about when a model's output should (or should not) be
trusted. Many probabilistic deep learning methods, including Bayesian-and
non-Bayesian methods, have been proposed in the literature for quantifying
predictive uncertainty, but to our knowledge there has not previously been a
rigorous large-scale empirical comparison of these methods under dataset shift.
We present a large-scale benchmark of existing state-of-the-art methods on
classification problems and investigate the effect of dataset shift on accuracy
and calibration. We find that traditional post-hoc calibration does indeed fall
short, as do several other previous methods. However, some methods that
marginalize over models give surprisingly strong results across a broad
spectrum of tasks. |
---|---|
DOI: | 10.48550/arxiv.1906.02530 |