Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
A prominent challenge of offline reinforcement learning (RL) is the issue of hidden confounding: unobserved variables may influence both the actions taken by the agent and the observed outcomes. Hidden confounding can compromise the validity of any causal conclusion drawn from data and presents a ma...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A prominent challenge of offline reinforcement learning (RL) is the issue of
hidden confounding: unobserved variables may influence both the actions taken
by the agent and the observed outcomes. Hidden confounding can compromise the
validity of any causal conclusion drawn from data and presents a major obstacle
to effective offline RL. In the present paper, we tackle the problem of hidden
confounding in the nonidentifiable setting. We propose a definition of
uncertainty due to hidden confounding bias, termed delphic uncertainty, which
uses variation over world models compatible with the observations, and
differentiate it from the well-known epistemic and aleatoric uncertainties. We
derive a practical method for estimating the three types of uncertainties, and
construct a pessimistic offline RL algorithm to account for them. Our method
does not assume identifiability of the unobserved confounders, and attempts to
reduce the amount of confounding bias. We demonstrate through extensive
experiments and ablations the efficacy of our approach on a sepsis management
benchmark, as well as on electronic health records. Our results suggest that
nonidentifiable hidden confounding bias can be mitigated to improve offline RL
solutions in practice. |
---|---|
DOI: | 10.48550/arxiv.2306.01157 |