Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Sonabend-W, Aaron; Lu, Junwei; Celi, Leo A.; Cai, Tianxi; Szolovits, Peter

Computer Science > Machine Learning

arXiv:2006.13189 (cs)

[Submitted on 23 Jun 2020 (v1), last revised 30 Oct 2020 (this version, v2)]

Title:Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Authors:Aaron Sonabend-W, Junwei Lu, Leo A. Celi, Tianxi Cai, Peter Szolovits

View PDF

Abstract:Offline Reinforcement Learning (RL) is a promising approach for learning optimal policies in environments where direct exploration is expensive or unfeasible. However, the adoption of such policies in practice is often challenging, as they are hard to interpret within the application context, and lack measures of uncertainty for the learned policy value and its decisions. To overcome these issues, we propose an Expert-Supervised RL (ESRL) framework which uses uncertainty quantification for offline policy learning. In particular, we have three contributions: 1) the method can learn safe and optimal policies through hypothesis testing, 2) ESRL allows for different levels of risk averse implementations tailored to the application context, and finally, 3) we propose a way to interpret ESRL's policy at every state through posterior distributions, and use this framework to compute off-policy value function posteriors. We provide theoretical guarantees for our estimators and regret bounds consistent with Posterior Sampling for RL (PSRL). Sample efficiency of ESRL is independent of the chosen risk aversion threshold and quality of the behavior policy.

Comments:	to be published in NeurIPS 2020
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)
Cite as:	arXiv:2006.13189 [cs.LG]
	(or arXiv:2006.13189v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2006.13189

Submission history

From: Aaron Sonabend [view email]
[v1] Tue, 23 Jun 2020 17:43:44 UTC (1,133 KB)
[v2] Fri, 30 Oct 2020 19:14:14 UTC (1,895 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2020-06

Change to browse by:

cs
cs.AI
stat
stat.ME
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Junwei Lu
Leo A. Celi
Tianxi Cai
Peter Szolovits

export BibTeX citation

Computer Science > Machine Learning

Title:Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators