Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding

A prominent challenge of offline reinforcement learning (RL) is the issue of hidden confounding: unobserved variables may influence both the actions taken by the agent and the observed outcomes. Hidden confounding can compromise the validity of any causal conclusion drawn from data and presents a ma...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Pace, Alizée, Yèche, Hugo, Schölkopf, Bernhard, Rätsch, Gunnar, Tennenholtz, Guy
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Pace, Alizée Yèche, Hugo Schölkopf, Bernhard Rätsch, Gunnar Tennenholtz, Guy
description	A prominent challenge of offline reinforcement learning (RL) is the issue of hidden confounding: unobserved variables may influence both the actions taken by the agent and the observed outcomes. Hidden confounding can compromise the validity of any causal conclusion drawn from data and presents a major obstacle to effective offline RL. In the present paper, we tackle the problem of hidden confounding in the nonidentifiable setting. We propose a definition of uncertainty due to hidden confounding bias, termed delphic uncertainty, which uses variation over world models compatible with the observations, and differentiate it from the well-known epistemic and aleatoric uncertainties. We derive a practical method for estimating the three types of uncertainties, and construct a pessimistic offline RL algorithm to account for them. Our method does not assume identifiability of the unobserved confounders, and attempts to reduce the amount of confounding bias. We demonstrate through extensive experiments and ablations the efficacy of our approach on a sepsis management benchmark, as well as on electronic health records. Our results suggest that nonidentifiable hidden confounding bias can be mitigated to improve offline RL solutions in practice.
doi_str_mv	10.48550/arxiv.2306.01157
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_01157</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_01157</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-343e6343ceecbd2bfcd04b43279796719fb32693e77cbe6675b7bf8b9165a66c3</originalsourceid><addsrcrecordid>eNotj8tOwzAQRb1hgUo_gBX-gYQ4jmfqJQqUIkWNhLqP_BiDpdSpzEPw95jC5o7u6MxIh7Fr0dTdRqnm1uSv-Fm3soG6EULhJRvvaT69RsfHEOaYiD9TTGHJjo6U3vlAJqeYXvhH8pT5fknRl30M0diZ-C76Unm_lJNCFPCKXQQzv9H6f67YYftw6HfVMD4-9XdDZQCxkp0kKOGInPWtDc43ne1kixo1oNDByha0JERnCQCVRRs2VgtQBsDJFbv5e3s2mk45Hk3-nn7NprOZ_AE9aUm1</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding</title><source>arXiv.org</source><creator>Pace, Alizée ; Yèche, Hugo ; Schölkopf, Bernhard ; Rätsch, Gunnar ; Tennenholtz, Guy</creator><creatorcontrib>Pace, Alizée ; Yèche, Hugo ; Schölkopf, Bernhard ; Rätsch, Gunnar ; Tennenholtz, Guy</creatorcontrib><description>A prominent challenge of offline reinforcement learning (RL) is the issue of hidden confounding: unobserved variables may influence both the actions taken by the agent and the observed outcomes. Hidden confounding can compromise the validity of any causal conclusion drawn from data and presents a major obstacle to effective offline RL. In the present paper, we tackle the problem of hidden confounding in the nonidentifiable setting. We propose a definition of uncertainty due to hidden confounding bias, termed delphic uncertainty, which uses variation over world models compatible with the observations, and differentiate it from the well-known epistemic and aleatoric uncertainties. We derive a practical method for estimating the three types of uncertainties, and construct a pessimistic offline RL algorithm to account for them. Our method does not assume identifiability of the unobserved confounders, and attempts to reduce the amount of confounding bias. We demonstrate through extensive experiments and ablations the efficacy of our approach on a sepsis management benchmark, as well as on electronic health records. Our results suggest that nonidentifiable hidden confounding bias can be mitigated to improve offline RL solutions in practice.</description><identifier>DOI: 10.48550/arxiv.2306.01157</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2023-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.01157$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.01157$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Pace, Alizée</creatorcontrib><creatorcontrib>Yèche, Hugo</creatorcontrib><creatorcontrib>Schölkopf, Bernhard</creatorcontrib><creatorcontrib>Rätsch, Gunnar</creatorcontrib><creatorcontrib>Tennenholtz, Guy</creatorcontrib><title>Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding</title><description>A prominent challenge of offline reinforcement learning (RL) is the issue of hidden confounding: unobserved variables may influence both the actions taken by the agent and the observed outcomes. Hidden confounding can compromise the validity of any causal conclusion drawn from data and presents a major obstacle to effective offline RL. In the present paper, we tackle the problem of hidden confounding in the nonidentifiable setting. We propose a definition of uncertainty due to hidden confounding bias, termed delphic uncertainty, which uses variation over world models compatible with the observations, and differentiate it from the well-known epistemic and aleatoric uncertainties. We derive a practical method for estimating the three types of uncertainties, and construct a pessimistic offline RL algorithm to account for them. Our method does not assume identifiability of the unobserved confounders, and attempts to reduce the amount of confounding bias. We demonstrate through extensive experiments and ablations the efficacy of our approach on a sepsis management benchmark, as well as on electronic health records. Our results suggest that nonidentifiable hidden confounding bias can be mitigated to improve offline RL solutions in practice.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAQRb1hgUo_gBX-gYQ4jmfqJQqUIkWNhLqP_BiDpdSpzEPw95jC5o7u6MxIh7Fr0dTdRqnm1uSv-Fm3soG6EULhJRvvaT69RsfHEOaYiD9TTGHJjo6U3vlAJqeYXvhH8pT5fknRl30M0diZ-C76Unm_lJNCFPCKXQQzv9H6f67YYftw6HfVMD4-9XdDZQCxkp0kKOGInPWtDc43ne1kixo1oNDByha0JERnCQCVRRs2VgtQBsDJFbv5e3s2mk45Hk3-nn7NprOZ_AE9aUm1</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Pace, Alizée</creator><creator>Yèche, Hugo</creator><creator>Schölkopf, Bernhard</creator><creator>Rätsch, Gunnar</creator><creator>Tennenholtz, Guy</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230601</creationdate><title>Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding</title><author>Pace, Alizée ; Yèche, Hugo ; Schölkopf, Bernhard ; Rätsch, Gunnar ; Tennenholtz, Guy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-343e6343ceecbd2bfcd04b43279796719fb32693e77cbe6675b7bf8b9165a66c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Pace, Alizée</creatorcontrib><creatorcontrib>Yèche, Hugo</creatorcontrib><creatorcontrib>Schölkopf, Bernhard</creatorcontrib><creatorcontrib>Rätsch, Gunnar</creatorcontrib><creatorcontrib>Tennenholtz, Guy</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pace, Alizée</au><au>Yèche, Hugo</au><au>Schölkopf, Bernhard</au><au>Rätsch, Gunnar</au><au>Tennenholtz, Guy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding</atitle><date>2023-06-01</date><risdate>2023</risdate><abstract>A prominent challenge of offline reinforcement learning (RL) is the issue of hidden confounding: unobserved variables may influence both the actions taken by the agent and the observed outcomes. Hidden confounding can compromise the validity of any causal conclusion drawn from data and presents a major obstacle to effective offline RL. In the present paper, we tackle the problem of hidden confounding in the nonidentifiable setting. We propose a definition of uncertainty due to hidden confounding bias, termed delphic uncertainty, which uses variation over world models compatible with the observations, and differentiate it from the well-known epistemic and aleatoric uncertainties. We derive a practical method for estimating the three types of uncertainties, and construct a pessimistic offline RL algorithm to account for them. Our method does not assume identifiability of the unobserved confounders, and attempts to reduce the amount of confounding bias. We demonstrate through extensive experiments and ablations the efficacy of our approach on a sepsis management benchmark, as well as on electronic health records. Our results suggest that nonidentifiable hidden confounding bias can be mitigated to improve offline RL solutions in practice.</abstract><doi>10.48550/arxiv.2306.01157</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2306.01157
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2306_01157
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning
title	Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T13%3A00%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Delphic%20Offline%20Reinforcement%20Learning%20under%20Nonidentifiable%20Hidden%20Confounding&rft.au=Pace,%20Aliz%C3%A9e&rft.date=2023-06-01&rft_id=info:doi/10.48550/arxiv.2306.01157&rft_dat=%3Carxiv_GOX%3E2306_01157%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true