Search | arXiv e-print repository

arXiv:1907.02347 [pdf, other]

doi 10.4064/bc122-2

Markov Decision Processes under Ambiguity

Abstract: We consider statistical Markov Decision Processes where the decision maker is risk averse against model ambiguity. The latter is given by an unknown parameter which influences the transition law and the cost functions. Risk aversion is either measured by the entropic risk measure or by the Average Value at Risk. We show how to solve these kind of problems using a general minimax theorem. Under som… ▽ More We consider statistical Markov Decision Processes where the decision maker is risk averse against model ambiguity. The latter is given by an unknown parameter which influences the transition law and the cost functions. Risk aversion is either measured by the entropic risk measure or by the Average Value at Risk. We show how to solve these kind of problems using a general minimax theorem. Under some continuity and compactness assumptions we prove the existence of an optimal (deterministic) policy and discuss its computation. We illustrate our results using an example from statistical decision theory. △ Less

Submitted 4 July, 2019; originally announced July 2019.

MSC Class: 90C40; 62C05

Journal ref: Banach Center Publications 122 (2020), 25-39

arXiv:1703.09509 [pdf, other]

Partially Observable Risk-Sensitive Stopping Problems in Discrete Time

Authors: Nicole Bäuerle, Ulrich Rieder

Abstract: In this paper we consider stopping problems with partial observation under a general risk-sensitive optimization criterion for problems with finite and infinite time horizon. Our aim is to maximize the certainty equivalent of the stopping reward. We develop a general theory and discuss the Bayesian risk-sensitive house selling problem as a special example. In particular we are able to study the in… ▽ More In this paper we consider stopping problems with partial observation under a general risk-sensitive optimization criterion for problems with finite and infinite time horizon. Our aim is to maximize the certainty equivalent of the stopping reward. We develop a general theory and discuss the Bayesian risk-sensitive house selling problem as a special example. In particular we are able to study the influence of the attitude towards risk of the decision maker on the optimal stopping rule. △ Less

Submitted 28 March, 2017; originally announced March 2017.

MSC Class: 60J27; 90C40

Journal ref: Modern trends of controlled stochastic processes: Theory and Applications, vol.II (A.B. Piunovskiy ed). Luniver Press, 12-31, 2015

arXiv:1604.01896 [pdf, other]

doi 10.1016/j.spa.2016.06.020

Zero-sum Risk-Sensitive Stochastic Games

Authors: Nicole Bäuerle, Ulrich Rieder

Abstract: In this paper we consider two-person zero-sum risk-sensitive stochastic dynamic games with Borel state and action spaces and bounded reward. The term risk-sensitive refers to the fact that instead of the usual risk neutral optimization criterion we consider the exponential certainty equivalent. The discounted reward case on a finite and an infinite time horizon is considered, as well as the ergodi… ▽ More In this paper we consider two-person zero-sum risk-sensitive stochastic dynamic games with Borel state and action spaces and bounded reward. The term risk-sensitive refers to the fact that instead of the usual risk neutral optimization criterion we consider the exponential certainty equivalent. The discounted reward case on a finite and an infinite time horizon is considered, as well as the ergodic reward case. Under continuity and compactness conditions we prove that the value of the game exists and solves the Shapley equation and we show the existence of optimal (non-stationary) strategies. In the ergodic reward case we work with a local minorization property and a Lyapunov condition and show that the value of the game solves the Poisson equation. Moreover, we prove the existence of optimal stationary strategies. A simple example highlights the influence of the risk-sensitivity parameter. Our results generalize findings in Basu/Ghosh 2014 and answer an open question posed there. △ Less

Submitted 29 June, 2016; v1 submitted 7 April, 2016; originally announced April 2016.

MSC Class: 91A15; 91A50; 90C40; 60J05

Journal ref: Stochastic Processes and their Applications 127(2), 622-642 (2017)

arXiv:1504.03530 [pdf, other]

doi 10.1287/moor.2016.0844

Partially Observable Risk-Sensitive Markov Decision Processes

Authors: Nicole Bäuerle, Ulrich Rieder

Abstract: We consider the problem of minimizing a certainty equivalent of the total or discounted cost over a finite and an infinite time horizon which is generated by a Partially Observable Markov Decision Process (POMDP). The certainty equivalent is defined by $U^{-1}(EU(Y))$ where $U$ is an increasing function. In contrast to a risk-neutral decision maker this optimization criterion takes the variability… ▽ More We consider the problem of minimizing a certainty equivalent of the total or discounted cost over a finite and an infinite time horizon which is generated by a Partially Observable Markov Decision Process (POMDP). The certainty equivalent is defined by $U^{-1}(EU(Y))$ where $U$ is an increasing function. In contrast to a risk-neutral decision maker this optimization criterion takes the variability of the cost into account. It contains as a special case the classical risk-sensitive optimization criterion with an exponential utility. We show that this optimization problem can be solved by embedding the problem into a completely observable Markov Decision Process with extended state space and give conditions under which an optimal policy exists. The state space has to be extended by the joint conditional distribution of current unobserved state and accumulated cost. In case of an exponential utility, the problem simplifies considerably and we rediscover what in previous literature has been named information state. However, since we do not use any change of measure techniques here, our approach is simpler. A small numerical example, namely the classical repeated casino game with unknown success probability is considered to illustrate the influence of the certainty equivalent and its parameters. △ Less

Submitted 29 November, 2016; v1 submitted 14 April, 2015; originally announced April 2015.

Journal ref: Mathematics of Operations Research 42(4):1180-1196, 2017

arXiv:math/0607098 [pdf, ps, other]

doi 10.1214/105051606000000105

Average optimality for continuous-time Markov decision processes in polish spaces

Authors: Xianping Guo, Ulrich Rieder

Abstract: This paper is devoted to studying the average optimality in continuous-time Markov decision processes with fairly general state and action spaces. The criterion to be maximized is expected average rewards. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We first provide two optimal… ▽ More This paper is devoted to studying the average optimality in continuous-time Markov decision processes with fairly general state and action spaces. The criterion to be maximized is expected average rewards. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We first provide two optimality inequalities with opposed directions, and also give suitable conditions under which the existence of solutions to the two optimality inequalities is ensured. Then, from the two optimality inequalities we prove the existence of optimal (deterministic) stationary policies by using the Dynkin formula. Moreover, we present a ``semimartingale characterization'' of an optimal stationary policy. Finally, we use a generalized Potlach process with control to illustrate the difference between our conditions and those in the previous literature, and then further apply our results to average optimal control problems of generalized birth--death systems, upwardly skip-free processes and two queueing systems. The approach developed in this paper is slightly different from the ``optimality inequality approach'' widely used in the previous literature. △ Less

Submitted 4 July, 2006; originally announced July 2006.

Comments: Published at http://dx.doi.org/10.1214/105051606000000105 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP0155 MSC Class: 90C40; 93E20 (Primary)

Journal ref: Annals of Applied Probability 2006, Vol. 16, No. 2, 730-756

Showing 1–5 of 5 results for author: Rieder, U