Robust Temporal Difference Learning for Critical Domains

Klima, Richard; Bloembergen, Daan; Kaisers, Michael; Tuyls, Karl

Computer Science > Machine Learning

arXiv:1901.08021 (cs)

[Submitted on 23 Jan 2019 (v1), last revised 13 Mar 2019 (this version, v2)]

Title:Robust Temporal Difference Learning for Critical Domains

Authors:Richard Klima, Daan Bloembergen, Michael Kaisers, Karl Tuyls

View PDF

Abstract:We present a new Q-function operator for temporal difference (TD) learning methods that explicitly encodes robustness against significant rare events (SRE) in critical domains. The operator, which we call the $\kappa$-operator, allows to learn a robust policy in a model-based fashion without actually observing the SRE. We introduce single- and multi-agent robust TD methods using the operator $\kappa$. We prove convergence of the operator to the optimal robust Q-function with respect to the model using the theory of Generalized Markov Decision Processes. In addition we prove convergence to the optimal Q-function of the original MDP given that the probability of SREs vanishes. Empirical evaluations demonstrate the superior performance of $\kappa$-based TD methods both in the early learning phase as well as in the final converged stage. In addition we show robustness of the proposed method to small model errors, as well as its applicability in a multi-agent context.

Comments:	AAMAS 2019
Subjects:	Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Cite as:	arXiv:1901.08021 [cs.LG]
	(or arXiv:1901.08021v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1901.08021

Submission history

From: Richard Klima [view email]
[v1] Wed, 23 Jan 2019 17:34:51 UTC (284 KB)
[v2] Wed, 13 Mar 2019 09:27:46 UTC (616 KB)

Computer Science > Machine Learning

Title:Robust Temporal Difference Learning for Critical Domains

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Robust Temporal Difference Learning for Critical Domains

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators