-
Yet Another Distributional Bellman Equation
Authors:
Nicole Bäuerle,
Tamara Göll,
Anna Jaśkiewicz
Abstract:
We consider non-standard Markov Decision Processes (MDPs) where the target function is not only a simple expectation of the accumulated reward. Instead, we consider rather general functionals of the joint distribution of terminal state and accumulated reward which have to be optimized. For finite state and compact action space, we show how to solve these problems by defining a lifted MDP whose sta…
▽ More
We consider non-standard Markov Decision Processes (MDPs) where the target function is not only a simple expectation of the accumulated reward. Instead, we consider rather general functionals of the joint distribution of terminal state and accumulated reward which have to be optimized. For finite state and compact action space, we show how to solve these problems by defining a lifted MDP whose state space is the space of distributions over the true states of the process. We derive a Bellman equation in this setting, which can be considered as a distributional Bellman equation. Well-known cases like the standard MDP and quantile MDPs are shown to be special examples of our framework. We also apply our model to a variant of an optimal transport problem.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Stochastic dynamic programming under recursive Epstein-Zin preferences
Authors:
Anna Jaśkiewicz,
Andrzej S. Nowak
Abstract:
This paper investigates discrete-time Markov decision processes with
recursive utilities (or payoffs) defined by the classic CES aggregator and
the Kreps-Porteus certainty equivalent operator. According to the classification introduced by Marinacci and Montrucchio, some aggregators that we consider are Thompson and some of them are neither Thompson nor Blackwell. We focus on the existence and…
▽ More
This paper investigates discrete-time Markov decision processes with
recursive utilities (or payoffs) defined by the classic CES aggregator and
the Kreps-Porteus certainty equivalent operator. According to the classification introduced by Marinacci and Montrucchio, some aggregators that we consider are Thompson and some of them are neither Thompson nor Blackwell. We focus on the existence and uniqueness of a solution to the Bellman equation. Since the per-period utilities can be unbounded, we work with the weighted supremum norm. Our paper shows three major points for such models. Firstly, we prove that the Bellman equation can be obtained by the Banach fixed point theorem for contraction mappings acting on a standard complete metric space. Secondly, we need not assume any boundary conditions, which are present when the Thompson metric or the Du's theorem are used. Thirdly, our results give better bounds for the geometric convergence of the value iteration algorithm than those obtained by Du's fixed point theorem. Moreover, our techniques allow to derive the Bellman equation for some values of parameters in the CES aggregator and the Kreps-Porteus certainty equivalent that cannot be solved by Du's theorem for increasing and convex or concave operators acting on an ordered Banach space.
△ Less
Submitted 9 July, 2025; v1 submitted 24 October, 2024;
originally announced October 2024.
-
Time-consistency in the mean-variance problem: A new perspective
Authors:
Nicole Bäuerle,
Anna Jaśkiewicz
Abstract:
We investigate discrete-time mean-variance portfolio selection problems viewed as a Markov decision process. We transform the problems into a new model with deterministic transition function for which the Bellman optimality equation holds. In this way, we can solve the problem recursively and obtain a time-consistent solution, that is an optimal solution that meets the Bellman optimality principle…
▽ More
We investigate discrete-time mean-variance portfolio selection problems viewed as a Markov decision process. We transform the problems into a new model with deterministic transition function for which the Bellman optimality equation holds. In this way, we can solve the problem recursively and obtain a time-consistent solution, that is an optimal solution that meets the Bellman optimality principle. We apply our technique for solving explicitly a more general framework.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
On approximate and weak correlated equilibria in constrained discounted stochastic games
Authors:
Anna Jaśkiewicz,
Andrzej S. Nowak
Abstract:
In this paper, we consider constrained discounted stochastic games with a countably generated state space and norm continuous transition probability having a density function. We prove existence of approximate stationary equilibria and stationary weak correlated equilibria. Our results imply the existence of stationary Nash equilibrium in $ARAT$ stochastic games.
In this paper, we consider constrained discounted stochastic games with a countably generated state space and norm continuous transition probability having a density function. We prove existence of approximate stationary equilibria and stationary weak correlated equilibria. Our results imply the existence of stationary Nash equilibrium in $ARAT$ stochastic games.
△ Less
Submitted 20 October, 2022; v1 submitted 4 January, 2022;
originally announced January 2022.
-
Constrained discounted stochastic games
Authors:
Anna Jaśkiewicz,
Andrzej S. Nowak
Abstract:
In this paper, we consider a large class of constrained non-cooperative stochastic Markov games with countable state spaces and discounted cost criteria. In one-player case, i.e., constrained discounted Markov decision models, it is possible to formulate a static optimisation problem whose solution determines a stationary optimal strategy (alias control or policy) in the dynamical infinite horizon…
▽ More
In this paper, we consider a large class of constrained non-cooperative stochastic Markov games with countable state spaces and discounted cost criteria. In one-player case, i.e., constrained discounted Markov decision models, it is possible to formulate a static optimisation problem whose solution determines a stationary optimal strategy (alias control or policy) in the dynamical infinite horizon model. This solution lies in the compact convex set of all occupation measures induced by strategies, defined on the set of state-action pairs. In case of n-person discounted games the occupation measures are induced by strategies of all players. Therefore, it is difficult to generalise the approach for constrained discounted Markov decision processes directly. It is not clear how to define the domain for the best-response correspondence whose fixed point induces a stationary equilibrium in the Markov game. This domain should be the Cartesian product of compact convex sets in locally convex topological vector spaces. One of our main results shows how to overcome this difficulty and define a constrained non-cooperative static game whose Nash equilibrium induces by a stationary Nash equilibrium in the Markov game. This is done for games with bounded cost functions and positive initial state distribution. An extension to a class of Markov games with unbounded costs and arbitrary initial state distribution relies on approximation of the unbounded game by bounded ones with positive initial state distributions. In the unbounded case, we assume the uniform integrability of the discounted costs with respect to all probability measures induced by strategies of the players, defined on the space of plays (histories) of the game. Our assumptions are weaker than those applied in earlier works on discounted dynamic programming or stochastic games using so-called weighted norm approaches.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Stochastic dynamic programming with non-linear discounting
Authors:
Nicole Bäuerle,
Anna Jaśkiewicz,
Andrzej S. Nowak
Abstract:
In this paper, we study a Markov decision process with a non-linear discount function and with a Borel state space. We define a recursive discounted utility, which resembles non-additive utility functions considered in a number of models in economics. Non-additivity here follows from non-linearity of the discount function. Our study is complementary to the work of Jaśkiewicz, Matkowski and Nowak (…
▽ More
In this paper, we study a Markov decision process with a non-linear discount function and with a Borel state space. We define a recursive discounted utility, which resembles non-additive utility functions considered in a number of models in economics. Non-additivity here follows from non-linearity of the discount function. Our study is complementary to the work of Jaśkiewicz, Matkowski and Nowak (Math. Oper. Res. 38 (2013), 108-121), where also non-linear discounting is used in the stochastic setting, but the expectation of utilities aggregated on the space of all histories of the process is applied leading to a non-stationary dynamic programming model. Our aim is to prove that in the recursive discounted utility case the Bellman equation has a solution and there exists an optimal stationary policy for the problem in the infinite time horizon. Our approach includes two cases: $(a)$ when the one-stage utility is bounded on both sides by a weight function multiplied by some positive and negative constants, and $(b)$ when the one-stage utility is unbounded from below.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
Constrained discounted Markov decision processes with Borel state spaces
Authors:
Eugene A. Feinberg,
Anna Jaśkiewicz,
Andrzej S. Nowak
Abstract:
We study discrete-time discounted constrained Markov decision processes (CMDPs) on Borel spaces with unbounded reward functions. In our approach the transition probability functions are weakly or set-wise continuous. The reward functions are upper semicontinuous in state-action pairs or semicontinuous in actions. Our aim is to study models with unbounded reward functions, which are often encounter…
▽ More
We study discrete-time discounted constrained Markov decision processes (CMDPs) on Borel spaces with unbounded reward functions. In our approach the transition probability functions are weakly or set-wise continuous. The reward functions are upper semicontinuous in state-action pairs or semicontinuous in actions. Our aim is to study models with unbounded reward functions, which are often encountered in applications, e.g., in consumption/investment problems. We provide some general assumptions under which the optimization problems in CMDPs are solvable in the class of stationary randomized policies. Then, we indicate that if the initial distribution and transition probabilities are non-atomic, then using a general purification result of Feinberg and Piunovskiy, stationary optimal policies can be deterministic. Our main results are illustrated by five examples.
△ Less
Submitted 27 March, 2019; v1 submitted 1 June, 2018;
originally announced June 2018.
-
On a generalization of the Dvoretzky-Wald-Wolfowitz theorem with an application to a robust optimization problem
Authors:
Anna Jaśkiewicz,
Andrzej S. Nowak
Abstract:
A generalization of the Dvoretzky-Wald-Wolfowitz theorem to the case of conditional expectations is provided assuming that the $σ$-field on the state space has no conditional atoms.
A generalization of the Dvoretzky-Wald-Wolfowitz theorem to the case of conditional expectations is provided assuming that the $σ$-field on the state space has no conditional atoms.
△ Less
Submitted 20 December, 2017;
originally announced December 2017.
-
Optimal Dividend Payout Model with Risk Sensitive Preferences
Authors:
Nicole Bäuerle,
Anna Jaśkiewicz
Abstract:
We consider a discrete-time dividend payout problem with risk sensitive shareholders. It is assumed that they are equipped with a risk aversion coefficient and construct their discounted payoff with the help of the exponential premium principle. This leads to a non-expected recursive utility of the dividends. Within such a framework not only the expected value of the dividends is taken into accoun…
▽ More
We consider a discrete-time dividend payout problem with risk sensitive shareholders. It is assumed that they are equipped with a risk aversion coefficient and construct their discounted payoff with the help of the exponential premium principle. This leads to a non-expected recursive utility of the dividends. Within such a framework not only the expected value of the dividends is taken into account but also their variability. Our approach is motivated by a remark in Gerber and Shiu (2004). We deal with the finite and infinite time horizon problems and prove that, even in general setting, the optimal dividend policy is a band policy. We also show that the policy improvement algorithm can be used to obtain the optimal policy and the corresponding value function. Next, an explicit example is provided, in which the optimal policy of a barrier type is shown to exist. Finally, we present some numerical studies and discuss the influence of the risk sensitive parameter on the optimal dividend policy.
△ Less
Submitted 7 March, 2017; v1 submitted 31 May, 2016;
originally announced May 2016.
-
Stochastic Optimal Growth Model with Risk Sensitive Preferences
Authors:
Nicole Bäuerle,
Anna Jaśkiewicz
Abstract:
This paper studies a one-sector optimal growth model with i.i.d. productivity shocks that are allowed to be unbounded. The utility function is assumed to be non-negative and unbounded from above. The novel feature in our framework is that the agent has risk sensitive preferences in the sense of Hansen and Sargent (1995). Under mild assumptions imposed on the productivity and utility functions we p…
▽ More
This paper studies a one-sector optimal growth model with i.i.d. productivity shocks that are allowed to be unbounded. The utility function is assumed to be non-negative and unbounded from above. The novel feature in our framework is that the agent has risk sensitive preferences in the sense of Hansen and Sargent (1995). Under mild assumptions imposed on the productivity and utility functions we prove that the maximal discounted non-expected utility in the infinite time horizon satisfies the optimality equation and the agent possesses a stationary optimal policy. A new point used in our analysis is an inequality for the so-called associated random variables. We also establish the Euler equation that incorporates the solution to the optimality equation.
△ Less
Submitted 18 September, 2015;
originally announced September 2015.
-
On the equivalence of two expected average cost criteria for semi- Markov control processes
Authors:
Anna Jaśkiewicz
Abstract:
The two expected average costs used in the theory of semi-Markov control processes with a Borel state space are considered. Under some stochastic stability conditions, we prove that the two criteria are equivalent in the sense that they lead to the same optimality equation.
The two expected average costs used in the theory of semi-Markov control processes with a Borel state space are considered. Under some stochastic stability conditions, we prove that the two criteria are equivalent in the sense that they lead to the same optimality equation.
△ Less
Submitted 19 September, 2013;
originally announced September 2013.
-
Risk-Sensitive Dividend Problems
Authors:
Nicole Bäuerle,
Anna Jaśkiewicz
Abstract:
We consider a discrete-time version of the popular optimal dividend pay-out problem in risk theory. The novel aspect of our approach is that we allow for a risk averse insurer, i.e., instead of maximising the expected discounted dividends until ruin we maximise the expected utility of discounted dividends until ruin. This task has been proposed as an open problem in H. Gerber and E. Shiu (2004). T…
▽ More
We consider a discrete-time version of the popular optimal dividend pay-out problem in risk theory. The novel aspect of our approach is that we allow for a risk averse insurer, i.e., instead of maximising the expected discounted dividends until ruin we maximise the expected utility of discounted dividends until ruin. This task has been proposed as an open problem in H. Gerber and E. Shiu (2004). The model in a continuous-time Brownian motion setting with the exponential utility function has been analysed in P. Grandits, F. Hubalek, W. Schachermayer and M. Zigo (2007). Nevertheless, a complete solution has not been provided. In this work, instead we solve the problem in discrete-time setup for the exponential and the power utility functions and give the structure of optimal history-dependent dividend policies. We make use of certain ideas studied earlier in N. Bäuerle and U. Rieder (2013), where Markov decision processes with general utility functions were treated. Our analysis, however, include new aspects, since the reward functions in this case are not bounded.
△ Less
Submitted 2 May, 2014; v1 submitted 19 June, 2013;
originally announced June 2013.
-
Average optimality for risk-sensitive control with general state space
Authors:
Anna Jaśkiewicz
Abstract:
This paper deals with discrete-time Markov control processes on a general state space. A long-run risk-sensitive average cost criterion is used as a performance measure. The one-step cost function is nonnegative and possibly unbounded. Using the vanishing discount factor approach, the optimality inequality and an optimal stationary strategy for the decision maker are established.
This paper deals with discrete-time Markov control processes on a general state space. A long-run risk-sensitive average cost criterion is used as a performance measure. The one-step cost function is nonnegative and possibly unbounded. Using the vanishing discount factor approach, the optimality inequality and an optimal stationary strategy for the decision maker are established.
△ Less
Submitted 3 April, 2007;
originally announced April 2007.