Skip to main content

Showing 1–9 of 9 results for author: Petrik, M

Searching in archive math. Search in all archives.
.
  1. arXiv:2312.03618  [pdf, other

    math.OC cs.GT

    Beyond discounted returns: Robust Markov decision processes with average and Blackwell optimality

    Authors: Julien Grand-Clément, Marek Petrik, Nicolas Vieille

    Abstract: Robust Markov Decision Processes (RMDPs) are a widely used framework for sequential decision-making under parameter uncertainty. RMDPs have been extensively studied when the objective is to maximize the discounted return, but little is known for average optimality (optimizing the long-run average of the rewards obtained over time) and Blackwell optimality (remaining discount optimal for all discou… ▽ More

    Submitted 14 January, 2025; v1 submitted 6 December, 2023; originally announced December 2023.

  2. arXiv:2304.12477  [pdf, other

    math.OC cs.AI

    On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

    Authors: Jia Lin Hau, Erick Delage, Mohammad Ghavamzadeh, Marek Petrik

    Abstract: Optimizing static risk-averse objectives in Markov decision processes is difficult because they do not admit standard dynamic programming equations common in Reinforcement Learning (RL) algorithms. Dynamic programming decompositions that augment the state space with discrete risk levels have recently gained popularity in the RL community. Prior work has shown that these decompositions are optimal… ▽ More

    Submitted 23 April, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

    Journal ref: Advances in Neural Information Processing Systems (Neurips), 2023

  3. arXiv:2209.10187  [pdf, other

    math.OC cs.LG

    On the convex formulations of robust Markov decision processes

    Authors: Julien Grand-Clément, Marek Petrik

    Abstract: Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the… ▽ More

    Submitted 13 December, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

  4. arXiv:2205.14202  [pdf, other

    math.OC cs.LG

    Robust Phi-Divergence MDPs

    Authors: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

    Abstract: In recent years, robust Markov decision processes (MDPs) have emerged as a prominent modeling framework for dynamic decision problems affected by uncertainty. In contrast to classical MDPs, which only account for stochasticity by modeling the dynamics through a stochastic process with a known transition kernel, robust MDPs additionally account for ambiguity by optimizing in view of the most advers… ▽ More

    Submitted 12 January, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

    Journal ref: Advances in Neural Information Processing Systems (Neurips), 2022

  5. arXiv:2011.14495  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Soft-Robust Algorithms for Batch Reinforcement Learning

    Authors: Elita A. Lobo, Mohammad Ghavamzadeh, Marek Petrik

    Abstract: In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately, such policies are typically overly conservative as the percentile criterion is non-convex, difficult to optimize, and ignores the mean performance. To overcome the… ▽ More

    Submitted 26 February, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

  6. arXiv:2006.11679  [pdf, other

    cs.LG math.OC stat.ML

    Entropic Risk Constrained Soft-Robust Policy Optimization

    Authors: Reazul Hasan Russel, Bahram Behzadian, Marek Petrik

    Abstract: Having a perfect model to compute the optimal policy is often infeasible in reinforcement learning. It is important in high-stakes domains to quantify and manage risk induced by model uncertainties. Entropic risk measure is an exponential utility-based convex risk measure that satisfies many reasonable properties. In this paper, we propose an entropic risk constrained policy gradient and actor-cri… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

  7. arXiv:2006.09484  [pdf, other

    cs.LG math.OC stat.ML

    Partial Policy Iteration for L1-Robust Markov Decision Processes

    Authors: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

    Abstract: Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which severely limits their scalability. This paper describ… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

  8. arXiv:1506.04514  [pdf, other

    math.OC

    Robust Policy Optimization with Baseline Guarantees

    Authors: Yinlam Chow, Marek Petrik, Mohammad Ghavamzadeh

    Abstract: Our goal is to compute a policy that guarantees improved return over a baseline policy even when the available MDP model is inaccurate. The inaccurate model may be constructed, for example, by system identification techniques when the true model is inaccessible. When the modeling error is large, the standard solution to the constructed model has no performance guarantees with respect to the true m… ▽ More

    Submitted 15 June, 2015; v1 submitted 15 June, 2015; originally announced June 2015.

  9. arXiv:1106.6102  [pdf, other

    q-fin.RM math.OC

    Tight Approximations of Dynamic Risk Measures

    Authors: Dan A. Iancu, Marek Petrik, Dharmashankar Subramanian

    Abstract: This paper compares two different frameworks recently introduced in the literature for measuring risk in a multi-period setting. The first corresponds to applying a single coherent risk measure to the cumulative future costs, while the second involves applying a composition of one-step coherent risk mappings. We summarize the relative strengths of the two methods, characterize several necessary an… ▽ More

    Submitted 23 August, 2013; v1 submitted 29 June, 2011; originally announced June 2011.