Is my Driver Observation Model Overconfident? Input-guided Calibration Networks for Reliable and Interpretable Confidence Estimates
Driver observation models are rarely deployed under perfect conditions. In practice, illumination, camera placement and type differ from the ones present during training and unforeseen behaviours may occur at any time. While observing the human behind the steering wheel leads to more intuitive human...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Driver observation models are rarely deployed under perfect conditions. In
practice, illumination, camera placement and type differ from the ones present
during training and unforeseen behaviours may occur at any time. While
observing the human behind the steering wheel leads to more intuitive
human-vehicle-interaction and safer driving, it requires recognition algorithms
which do not only predict the correct driver state, but also determine their
prediction quality through realistic and interpretable confidence measures.
Reliable uncertainty estimates are crucial for building trust and are a serious
obstacle for deploying activity recognition networks in real driving systems.
In this work, we for the first time examine how well the confidence values of
modern driver observation models indeed match the probability of the correct
outcome and show that raw neural network-based approaches tend to significantly
overestimate their prediction quality. To correct this misalignment between the
confidence values and the actual uncertainty, we consider two strategies.
First, we enhance two activity recognition models often used for driver
observation with temperature scaling-an off-the-shelf method for confidence
calibration in image classification. Then, we introduce Calibrated Action
Recognition with Input Guidance (CARING)-a novel approach leveraging an
additional neural network to learn scaling the confidences depending on the
video representation. Extensive experiments on the Drive&Act dataset
demonstrate that both strategies drastically improve the quality of model
confidences, while our CARING model out-performs both, the original
architectures and their temperature scaling enhancement, leading to best
uncertainty estimates. |
---|---|
DOI: | 10.48550/arxiv.2204.04674 |