Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

Yao, Qingmao; Lei, Zhichao; Chen, Tianyuan; Yuan, Ziyue; Chen, Xuefan; Liu, Jianxiang; Wu, Faguo; Zhang, Xiao

Computer Science > Machine Learning

arXiv:2506.08417 (cs)

[Submitted on 10 Jun 2025]

Title:Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

Authors:Qingmao Yao, Zhichao Lei, Tianyuan Chen, Ziyue Yuan, Xuefan Chen, Jianxiang Liu, Faguo Wu, Xiao Zhang

View PDF HTML (experimental)

Abstract:Offline Reinforcement Learning (RL) struggles with distributional shifts, leading to the $Q$-value overestimation for out-of-distribution (OOD) actions. Existing methods address this issue by imposing constraints; however, they often become overly conservative when evaluating OOD regions, which constrains the $Q$-function generalization. This over-constraint issue results in poor $Q$-value estimation and hinders policy improvement. In this paper, we introduce a novel approach to achieve better $Q$-value estimation by enhancing $Q$-function generalization in OOD regions within Convex Hull and its Neighborhood (CHN). Under the safety generalization guarantees of the CHN, we propose the Smooth Bellman Operator (SBO), which updates OOD $Q$-values by smoothing them with neighboring in-sample $Q$-values. We theoretically show that SBO approximates true $Q$-values for both in-sample and OOD actions within the CHN. Our practical algorithm, Smooth Q-function OOD Generalization (SQOG), empirically alleviates the over-constraint issue, achieving near-accurate $Q$-value estimation. On the D4RL benchmarks, SQOG outperforms existing state-of-the-art methods in both performance and computational efficiency.

Comments:	ICLR 2025
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2506.08417 [cs.LG]
	(or arXiv:2506.08417v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2506.08417

Submission history

From: Qingmao Yao [view email]
[v1] Tue, 10 Jun 2025 03:43:22 UTC (7,516 KB)

Computer Science > Machine Learning

Title:Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators