Large Scale Markov Decision Processes with Changing Rewards

Cardoso, Adrian Rivera; Wang, He; Xu, Huan

Computer Science > Machine Learning

arXiv:1905.10649 (cs)

[Submitted on 25 May 2019]

Title:Large Scale Markov Decision Processes with Changing Rewards

Authors:Adrian Rivera Cardoso, He Wang, Huan Xu

View PDF

Abstract:We consider Markov Decision Processes (MDPs) where the rewards are unknown and may change in an adversarial manner. We provide an algorithm that achieves state-of-the-art regret bound of $O( \sqrt{\tau (\ln|S|+\ln|A|)T}\ln(T))$, where $S$ is the state space, $A$ is the action space, $\tau$ is the mixing time of the MDP, and $T$ is the number of periods. The algorithm's computational complexity is polynomial in $|S|$ and $|A|$ per period. We then consider a setting often encountered in practice, where the state space of the MDP is too large to allow for exact solutions. By approximating the state-action occupancy measures with a linear architecture of dimension $d\ll|S|$, we propose a modified algorithm with computational complexity polynomial in $d$. We also prove a regret bound for this modified algorithm, which to the best of our knowledge this is the first $\tilde{O}(\sqrt{T})$ regret bound for large scale MDPs with changing rewards.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1905.10649 [cs.LG]
	(or arXiv:1905.10649v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1905.10649

Submission history

From: Adrian Rivera Cardoso [view email]
[v1] Sat, 25 May 2019 18:26:49 UTC (34 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-05

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Adrian Rivera Cardoso
He Wang
Huan Xu

export BibTeX citation

Computer Science > Machine Learning

Title:Large Scale Markov Decision Processes with Changing Rewards

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Large Scale Markov Decision Processes with Changing Rewards

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators