Bayesian Robust Optimization for Imitation Learning

Brown, Daniel S.; Niekum, Scott; Petrik, Marek

Computer Science > Machine Learning

arXiv:2007.12315 (cs)

[Submitted on 24 Jul 2020 (v1), last revised 1 Mar 2024 (this version, v4)]

Title:Bayesian Robust Optimization for Imitation Learning

Authors:Daniel S. Brown, Scott Niekum, Marek Petrik

View PDF HTML (experimental)

Abstract:One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function. While completely ignoring risk can lead to overly aggressive and unsafe policies, optimizing in a fully adversarial sense is also problematic as it can lead to overly conservative policies that perform poorly in practice. To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL). BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk. Our empirical results show that BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms. Code is available at this https URL.

Comments:	In proceedings NeurIPS 2020
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2007.12315 [cs.LG]
	(or arXiv:2007.12315v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2007.12315

Submission history

From: Daniel Brown [view email]
[v1] Fri, 24 Jul 2020 01:52:11 UTC (1,620 KB)
[v2] Tue, 4 Aug 2020 17:50:07 UTC (1,622 KB)
[v3] Sun, 8 Nov 2020 04:16:15 UTC (823 KB)
[v4] Fri, 1 Mar 2024 04:31:22 UTC (970 KB)

Computer Science > Machine Learning

Title:Bayesian Robust Optimization for Imitation Learning

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Bayesian Robust Optimization for Imitation Learning

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators