CAR-DESPOT: Causally-Informed Online POMDP Planning for Robots in Confounded Environments
Robots operating in real-world environments must reason about possible outcomes of stochastic actions and make decisions based on partial observations of the true world state. A major challenge for making accurate and robust action predictions is the problem of confounding, which if left untreated c...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Robots operating in real-world environments must reason about possible
outcomes of stochastic actions and make decisions based on partial observations
of the true world state. A major challenge for making accurate and robust
action predictions is the problem of confounding, which if left untreated can
lead to prediction errors. The partially observable Markov decision process
(POMDP) is a widely-used framework to model these stochastic and
partially-observable decision-making problems. However, due to a lack of
explicit causal semantics, POMDP planning methods are prone to confounding bias
and thus in the presence of unobserved confounders may produce underperforming
policies. This paper presents a novel causally-informed extension of "anytime
regularized determinized sparse partially observable tree" (AR-DESPOT), a
modern anytime online POMDP planner, using causal modelling and inference to
eliminate errors caused by unmeasured confounder variables. We further propose
a method to learn offline the partial parameterisation of the causal model for
planning, from ground truth model data. We evaluate our methods on a toy
problem with an unobserved confounder and show that the learned causal model is
highly accurate, while our planning method is more robust to confounding and
produces overall higher performing policies than AR-DESPOT. |
---|---|
DOI: | 10.48550/arxiv.2304.06848 |