Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Uehara, Masatoshi; Kiyohara, Haruka; Bennett, Andrew; Chernozhukov, Victor; Jiang, Nan; Kallus, Nathan; Shi, Chengchun; Sun, Wen

Computer Science > Machine Learning

arXiv:2207.13081 (cs)

[Submitted on 26 Jul 2022 (v1), last revised 14 Nov 2023 (this version, v2)]

Title:Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Authors:Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

View PDF

Abstract:We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is consistent as long as futures and histories contain sufficient information about latent states, and the Bellman completeness. Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.

Comments:	This paper was accepted in NeurIPS 2023
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2207.13081 [cs.LG]
	(or arXiv:2207.13081v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2207.13081

Submission history

From: Masatoshi Uehara [view email]
[v1] Tue, 26 Jul 2022 17:53:29 UTC (198 KB)
[v2] Tue, 14 Nov 2023 22:16:28 UTC (1,478 KB)

Computer Science > Machine Learning

Title:Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators