Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

Fu, Zuyue; Qi, Zhengling; Wang, Zhaoran; Yang, Zhuoran; Xu, Yanxun; Kosorok, Michael R.

Computer Science > Machine Learning

arXiv:2209.08666 (cs)

[Submitted on 18 Sep 2022]

Title:Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

Authors:Zuyue Fu, Zhengling Qi, Zhaoran Wang, Zhuoran Yang, Yanxun Xu, Michael R. Kosorok

View PDF

Abstract:We study the offline reinforcement learning (RL) in the face of unmeasured confounders. Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment. To tackle the above challenges, we study the policy learning in the confounded MDPs with the aid of instrumental variables. Specifically, we first establish value function (VF)-based and marginalized importance sampling (MIS)-based identification results for the expected total reward in the confounded MDPs. Then by leveraging pessimism and our identification results, we propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy under minimal data coverage and modeling assumptions. Lastly, our extensive theoretical investigations and one numerical study motivated by the kidney transplantation demonstrate the promising performance of the proposed methods.

Subjects:	Machine Learning (cs.LG); Methodology (stat.ME)
Cite as:	arXiv:2209.08666 [cs.LG]
	(or arXiv:2209.08666v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2209.08666

Submission history

From: Zuyue Fu [view email]
[v1] Sun, 18 Sep 2022 22:03:55 UTC (1,074 KB)

Computer Science > Machine Learning

Title:Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators