Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation

Dai, Juntao; Yang, Yaodong; Zheng, Qian; Pan, Gang

Computer Science > Machine Learning

arXiv:2412.11138 (cs)

[Submitted on 15 Dec 2024]

Title:Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation

Authors:Juntao Dai, Yaodong Yang, Qian Zheng, Gang Pan

View PDF HTML (experimental)

Abstract:A key aspect of Safe Reinforcement Learning (Safe RL) involves estimating the constraint condition for the next policy, which is crucial for guiding the optimization of safe policy updates. However, the existing Advantage-based Estimation (ABE) method relies on the infinite-horizon discounted advantage function. This dependence leads to catastrophic errors in finite-horizon scenarios with non-discounted constraints, resulting in safety-violation updates. In response, we propose the first estimation method for finite-horizon non-discounted constraints in deep Safe RL, termed Gradient-based Estimation (GBE), which relies on the analytic gradient derived along trajectories. Our theoretical and empirical analyses demonstrate that GBE can effectively estimate constraint changes over a finite horizon. Constructing a surrogate optimization problem with GBE, we developed a novel Safe RL algorithm called Constrained Gradient-based Policy Optimization (CGPO). CGPO identifies feasible optimal policies by iteratively resolving sub-problems within trust regions. Our empirical results reveal that CGPO, unlike baseline algorithms, successfully estimates the constraint functions of subsequent policies, thereby ensuring the efficiency and feasibility of each update.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.11138 [cs.LG]
	(or arXiv:2412.11138v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.11138
Journal reference:	Proceedings of the 41st International Conference on Machine Learning, PMLR 235:9872-9903, 2024

Submission history

From: Josef Dai [view email]
[v1] Sun, 15 Dec 2024 10:05:23 UTC (2,830 KB)

Computer Science > Machine Learning

Title:Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators