Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

Chen, Liyu; Jain, Rahul; Luo, Haipeng

Computer Science > Machine Learning

arXiv:2202.00150 (cs)

[Submitted on 31 Jan 2022]

Title:Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

Authors:Liyu Chen, Rahul Jain, Haipeng Luo

View PDF

Abstract:We study regret minimization for infinite-horizon average-reward Markov Decision Processes (MDPs) under cost constraints. We start by designing a policy optimization algorithm with carefully designed action-value estimator and bonus term, and show that for ergodic MDPs, our algorithm ensures $\widetilde{O}(\sqrt{T})$ regret and constant constraint violation, where $T$ is the total number of time steps. This strictly improves over the algorithm of (Singh et al., 2020), whose regret and constraint violation are both $\widetilde{O}(T^{2/3})$. Next, we consider the most general class of weakly communicating MDPs. Through a finite-horizon approximation, we develop another algorithm with $\widetilde{O}(T^{2/3})$ regret and constraint violation, which can be further improved to $\widetilde{O}(\sqrt{T})$ via a simple modification, albeit making the algorithm computationally inefficient. As far as we know, these are the first set of provable algorithms for weakly communicating MDPs with cost constraints.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2202.00150 [cs.LG]
	(or arXiv:2202.00150v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2202.00150

Submission history

From: Liyu Chen [view email]
[v1] Mon, 31 Jan 2022 23:52:34 UTC (74 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2022-02

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Liyu Chen
Rahul Jain
Haipeng Luo

export BibTeX citation

Computer Science > Machine Learning

Title:Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators