Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

Uehara, Masatoshi; Kallus, Nathan; Lee, Jason D.; Sun, Wen

Computer Science > Machine Learning

arXiv:2302.02392 (cs)

[Submitted on 5 Feb 2023 (v1), last revised 13 Nov 2023 (this version, v2)]

Title:Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

Authors:Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

View PDF

Abstract:In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose value-based algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage of just a single comparator policy, and realizability of soft (entropy-regularized) Q-function of the single policy and a related function defined as a saddle point of certain minimax optimization problem. This offers refined and generally more lax conditions for offline RL. We further show an analogous result for vanilla Q-functions under a soft margin condition. To attain these guarantees, we leverage novel minimax learning algorithms to accurately estimate soft or vanilla Q-functions with $L^2$-convergence guarantees. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.

Comments:	The original title of this paper was "Refined Value-Based Offline RL under Realizability and Partial Coverage," but it was later changed. This paper has been accepted for NeurIPS 2023
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2302.02392 [cs.LG]
	(or arXiv:2302.02392v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2302.02392

Submission history

From: Masatoshi Uehara [view email]
[v1] Sun, 5 Feb 2023 14:22:41 UTC (54 KB)
[v2] Mon, 13 Nov 2023 14:46:44 UTC (46 KB)

Computer Science > Machine Learning

Title:Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators