Deterministic MDPs with Adversarial Rewards and Bandit Feedback

Arora, Raman; Dekel, Ofer; Tewari, Ambuj

Computer Science > Computer Science and Game Theory

arXiv:1210.4843 (cs)

[Submitted on 16 Oct 2012]

Title:Deterministic MDPs with Adversarial Rewards and Bandit Feedback

Authors:Raman Arora, Ofer Dekel, Ambuj Tewari

View PDF

Abstract:We consider a Markov decision process with deterministic state transition dynamics, adversarially generated rewards that change arbitrarily from round to round, and a bandit feedback model in which the decision maker only observes the rewards it receives. In this setting, we present a novel and efficient online decision making algorithm named MarcoPolo. Under mild assumptions on the structure of the transition dynamics, we prove that MarcoPolo enjoys a regret of O(T^(3/4)sqrt(log(T))) against the best deterministic policy in hindsight. Specifically, our analysis does not rely on the stringent unichain assumption, which dominates much of the previous work on this topic.

Comments:	Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)
Subjects:	Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Report number:	UAI-P-2012-PG-93-101
Cite as:	arXiv:1210.4843 [cs.GT]
	(or arXiv:1210.4843v1 [cs.GT] for this version)
	https://doi.org/10.48550/arXiv.1210.4843

Submission history

From: Raman Arora [view email] [via AUAI proxy]
[v1] Tue, 16 Oct 2012 17:34:04 UTC (182 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.GT

< prev | next >

new | recent | 2012-10

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Raman Arora
Ofer Dekel
Ambuj Tewari

export BibTeX citation

Computer Science > Computer Science and Game Theory

Title:Deterministic MDPs with Adversarial Rewards and Bandit Feedback

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Science and Game Theory

Title:Deterministic MDPs with Adversarial Rewards and Bandit Feedback

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators