-
Data-Driven Exploration for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems
Authors:
Yilie Huang,
Xun Yu Zhou
Abstract:
We study reinforcement learning (RL) for the same class of continuous-time stochastic linear--quadratic (LQ) control problems as in \cite{huang2024sublinear}, where volatilities depend on both states and controls while states are scalar-valued and running control rewards are absent. We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the cri…
▽ More
We study reinforcement learning (RL) for the same class of continuous-time stochastic linear--quadratic (LQ) control problems as in \cite{huang2024sublinear}, where volatilities depend on both states and controls while states are scalar-valued and running control rewards are absent. We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the critic and policy variance by the actor. Unlike the constant or deterministic exploration schedules employed in \cite{huang2024sublinear}, which require extensive tuning for implementations and ignore learning progresses during iterations, our adaptive exploratory approach boosts learning efficiency with minimal tuning. Despite its flexibility, our method achieves a sublinear regret bound that matches the best-known model-free results for this class of LQ problems, which were previously derived only with fixed exploration schedules. Numerical experiments demonstrate that adaptive explorations accelerate convergence and improve regret performance compared to the non-adaptive model-free and model-based counterparts.
△ Less
Submitted 30 June, 2025;
originally announced July 2025.
-
Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study
Authors:
Yilie Huang,
Yanwei Jia,
Xun Yu Zhou
Abstract:
We study continuous-time mean--variance portfolio selection in markets where stock prices are diffusion processes driven by observable factors that are also diffusion processes yet the coefficients of these processes are unknown. Based on the recently developed reinforcement learning (RL) theory for diffusion processes, we present a general data-driven RL algorithm that learns the pre-committed in…
▽ More
We study continuous-time mean--variance portfolio selection in markets where stock prices are diffusion processes driven by observable factors that are also diffusion processes yet the coefficients of these processes are unknown. Based on the recently developed reinforcement learning (RL) theory for diffusion processes, we present a general data-driven RL algorithm that learns the pre-committed investment strategy directly without attempting to learn or estimate the market coefficients. For multi-stock Black--Scholes markets without factors, we further devise a baseline algorithm and prove its performance guarantee by deriving a sublinear regret bound in terms of Sharpe ratio. For performance enhancement and practical implementation, we modify the baseline algorithm into four variants, and carry out an extensive empirical study to compare their performance, in terms of a host of common metrics, with a large number of widely used portfolio allocation strategies on S\&P 500 constituents. The results demonstrate that the continuous-time RL strategies are consistently among the best especially in a volatile bear market, and decisively outperform the model-based continuous-time counterparts by significant margins.
△ Less
Submitted 8 December, 2024;
originally announced December 2024.
-
Regret of exploratory policy improvement and $q$-learning
Authors:
Wenpin Tang,
Xun Yu Zhou
Abstract:
We study the convergence of $q$-learning and related algorithms introduced by Jia and Zhou (J. Mach. Learn. Res., 24 (2023), 161) for controlled diffusion processes. Under suitable conditions on the growth and regularity of the model parameters, we provide a quantitative error and regret analysis of both the exploratory policy improvement algorithm and the $q$-learning algorithm.
We study the convergence of $q$-learning and related algorithms introduced by Jia and Zhou (J. Mach. Learn. Res., 24 (2023), 161) for controlled diffusion processes. Under suitable conditions on the growth and regularity of the model parameters, we provide a quantitative error and regret analysis of both the exploratory policy improvement algorithm and the $q$-learning algorithm.
△ Less
Submitted 2 November, 2024;
originally announced November 2024.
-
Reward-Directed Score-Based Diffusion Models via q-Learning
Authors:
Xuefeng Gao,
Jiale Zha,
Xun Yu Zhou
Abstract:
We propose a new reinforcement learning (RL) formulation for training continuous-time score-based diffusion models for generative AI to generate samples that maximize reward functions while keeping the generated distributions close to the unknown target data distributions. Different from most existing studies, our formulation does not involve any pretrained model for the unknown score functions of…
▽ More
We propose a new reinforcement learning (RL) formulation for training continuous-time score-based diffusion models for generative AI to generate samples that maximize reward functions while keeping the generated distributions close to the unknown target data distributions. Different from most existing studies, our formulation does not involve any pretrained model for the unknown score functions of the noise-perturbed data distributions. We present an entropy-regularized continuous-time RL problem and show that the optimal stochastic policy has a Gaussian distribution with a known covariance matrix. Based on this result, we parameterize the mean of Gaussian policies and develop an actor-critic type (little) q-learning algorithm to solve the RL problem. A key ingredient in our algorithm design is to obtain noisy observations from the unknown score function via a ratio estimator. Numerically, we show the effectiveness of our approach by comparing its performance with two state-of-the-art RL methods that fine-tune pretrained models. Finally, we discuss extensions of our RL formulation to probability flow ODE implementation of diffusion models and to conditional diffusion models.
△ Less
Submitted 7 September, 2024;
originally announced September 2024.
-
Learning to Optimally Stop Diffusion Processes, with Financial Applications
Authors:
Min Dai,
Yu Sun,
Zuo Quan Xu,
Xun Yu Zhou
Abstract:
We study optimal stopping for diffusion processes with unknown model primitives within the continuous-time reinforcement learning (RL) framework developed by Wang et al. (2020), and present applications to option pricing and portfolio choice. By penalizing the corresponding variational inequality formulation, we transform the stopping problem into a stochastic optimal control problem with two acti…
▽ More
We study optimal stopping for diffusion processes with unknown model primitives within the continuous-time reinforcement learning (RL) framework developed by Wang et al. (2020), and present applications to option pricing and portfolio choice. By penalizing the corresponding variational inequality formulation, we transform the stopping problem into a stochastic optimal control problem with two actions. We then randomize controls into Bernoulli distributions and add an entropy regularizer to encourage exploration. We derive a semi-analytical optimal Bernoulli distribution, based on which we devise RL algorithms using the martingale approach established in Jia and Zhou (2022a), and prove a policy improvement theorem. We demonstrate the effectiveness of the algorithms in pricing finite-horizon American put options and in solving Merton's problem with transaction costs, and show that both the offline and online algorithms achieve high accuracy in learning the value functions and characterizing the associated free boundaries.
△ Less
Submitted 8 September, 2024; v1 submitted 17 August, 2024;
originally announced August 2024.
-
Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems
Authors:
Yilie Huang,
Yanwei Jia,
Xun Yu Zhou
Abstract:
We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an…
▽ More
We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an RL algorithm to learn the optimal policy parameter directly. Our main contributions include the introduction of an exploration schedule and a regret analysis of the proposed algorithm. We provide the convergence rate of the policy parameter to the optimal one, and prove that the algorithm achieves a regret bound of $O(N^{\frac{3}{4}})$ up to a logarithmic factor, where $N$ is the number of learning episodes. We conduct a simulation study to validate the theoretical results and demonstrate the effectiveness and reliability of the proposed algorithm. We also perform numerical comparisons between our method and those of the recent model-based stochastic LQ RL studies adapted to the state- and control-dependent volatility setting, demonstrating a better performance of the former in terms of regret bounds.
△ Less
Submitted 2 July, 2025; v1 submitted 24 July, 2024;
originally announced July 2024.
-
Reinforcement Learning for Jump-Diffusions, with Financial Applications
Authors:
Xuefeng Gao,
Lingfei Li,
Xun Yu Zhou
Abstract:
We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the explora…
▽ More
We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the exploratory dynamics under jump-diffusions calls for a careful formulation of the jump part. Through a theoretical analysis, we find that one can simply use the same policy evaluation and $q$-learning algorithms in Jia and Zhou (2022a, 2023), originally developed for controlled diffusions, without needing to check a priori whether the underlying data come from a pure diffusion or a jump-diffusion. However, we show that the presence of jumps ought to affect parameterizations of actors and critics in general. We investigate as an application the mean--variance portfolio selection problem with stock price modelled as a jump-diffusion, and show that both RL algorithms and parameterizations are invariant with respect to jumps. Finally, we present a detailed study on applying the general theory to option hedging.
△ Less
Submitted 7 January, 2025; v1 submitted 26 May, 2024;
originally announced May 2024.
-
Robust utility maximization with intractable claims
Authors:
Yunhong Li,
Zuo Quan Xu,
Xun Yu Zhou
Abstract:
We study a continuous-time expected utility maximization problem in which the investor at maturity receives the value of a contingent claim in addition to the investment payoff from the financial market. The investor knows nothing about the claim other than its probability distribution, hence an ``intractable claim''. In view of the lack of necessary information about the claim, we consider a robu…
▽ More
We study a continuous-time expected utility maximization problem in which the investor at maturity receives the value of a contingent claim in addition to the investment payoff from the financial market. The investor knows nothing about the claim other than its probability distribution, hence an ``intractable claim''. In view of the lack of necessary information about the claim, we consider a robust formulation to maximize her utility in the worst scenario. We apply the quantile formulation to solve the problem, expressing the quantile function of the optimal terminal investment income as the solution of certain variational inequalities of ordinary differential equations and obtaining the resulting optimal trading strategy. In the case of an exponential utility, the problem reduces to a (non-robust) rank--dependent utility maximization with probability distortion whose solution is available in the literature. The results can also be used to determine the utility indifference price of the intractable claim.
△ Less
Submitted 14 July, 2023; v1 submitted 14 April, 2023;
originally announced April 2023.
-
Variable Clustering via Distributionally Robust Nodewise Regression
Authors:
Kaizheng Wang,
Xiao Xu,
Xun Yu Zhou
Abstract:
We study a multi-factor block model for variable clustering and connect it to the regularized subspace clustering by formulating a distributionally robust version of the nodewise regression. To solve the latter problem, we derive a convex relaxation, provide guidance on selecting the size of the robust region, and hence the regularization weighting parameter, based on the data, and propose an ADMM…
▽ More
We study a multi-factor block model for variable clustering and connect it to the regularized subspace clustering by formulating a distributionally robust version of the nodewise regression. To solve the latter problem, we derive a convex relaxation, provide guidance on selecting the size of the robust region, and hence the regularization weighting parameter, based on the data, and propose an ADMM algorithm for implementation. We validate our method in an extensive simulation study. Finally, we propose and apply a variant of our method to stock return data, obtain interpretable clusters that facilitate portfolio selection and compare its out-of-sample performance with other clustering methods in an empirical study.
△ Less
Submitted 20 December, 2022; v1 submitted 15 December, 2022;
originally announced December 2022.
-
Square-root regret bounds for continuous-time episodic Markov decision processes
Authors:
Xuefeng Gao,
Xun Yu Zhou
Abstract:
We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-horizon episodic setting. In contrast to discrete-time MDPs, the inter-transition times of a continuous-time MDP are exponentially distributed with rate parameters depending on the state--action pair at each transition. We present a learning algorithm based on the methods of value iteration and upper…
▽ More
We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-horizon episodic setting. In contrast to discrete-time MDPs, the inter-transition times of a continuous-time MDP are exponentially distributed with rate parameters depending on the state--action pair at each transition. We present a learning algorithm based on the methods of value iteration and upper confidence bound. We derive an upper bound on the worst-case expected regret for the proposed algorithm, and establish a worst-case lower bound, both bounds are of the order of square-root on the number of episodes. Finally, we conduct simulation experiments to illustrate the performance of our algorithm.
△ Less
Submitted 2 October, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
$g$-Expectation of Distributions
Authors:
Mingyu Xu,
Zuo Quan Xu,
Xun Yu Zhou
Abstract:
We define $g$-expectation of a distribution as the infimum of the $g$-expectations of all the terminal random variables sharing that distribution. We present two special cases for nonlinear $g$ where the $g$-expectation of distributions can be explicitly derived. As a related problem, we introduce the notion of law-invariant $g$-expectation and provide its sufficient conditions. Examples of applic…
▽ More
We define $g$-expectation of a distribution as the infimum of the $g$-expectations of all the terminal random variables sharing that distribution. We present two special cases for nonlinear $g$ where the $g$-expectation of distributions can be explicitly derived. As a related problem, we introduce the notion of law-invariant $g$-expectation and provide its sufficient conditions. Examples of application in financial dynamic portfolio choice are supplied.
△ Less
Submitted 12 August, 2022;
originally announced August 2022.
-
Logarithmic regret bounds for continuous-time average-reward Markov decision processes
Authors:
Xuefeng Gao,
Xun Yu Zhou
Abstract:
We consider reinforcement learning for continuous-time Markov decision processes (MDPs) in the infinite-horizon, average-reward setting. In contrast to discrete-time MDPs, a continuous-time process moves to a state and stays there for a random holding time after an action is taken. With unknown transition probabilities and rates of exponential holding times, we derive instance-dependent regret low…
▽ More
We consider reinforcement learning for continuous-time Markov decision processes (MDPs) in the infinite-horizon, average-reward setting. In contrast to discrete-time MDPs, a continuous-time process moves to a state and stays there for a random holding time after an action is taken. With unknown transition probabilities and rates of exponential holding times, we derive instance-dependent regret lower bounds that are logarithmic in the time horizon. Moreover, we design a learning algorithm and establish a finite-time regret bound that achieves the logarithmic growth rate. Our analysis builds upon upper confidence reinforcement learning, a delicate estimation of the mean holding times, and stochastic comparison of point processes.
△ Less
Submitted 2 July, 2024; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Exploratory HJB equations and their convergence
Authors:
Wenpin Tang,
Paul Yuming Zhang,
Xun Yu Zhou
Abstract:
We study the exploratory Hamilton--Jacobi--Bellman (HJB) equation arising from the entropy-regularized exploratory control problem, which was formulated by Wang, Zariphopoulou and Zhou (J. Mach. Learn. Res., 21, 2020) in the context of reinforcement learning in continuous time and space. We establish the well-posedness and regularity of the viscosity solution to the equation, as well as the conver…
▽ More
We study the exploratory Hamilton--Jacobi--Bellman (HJB) equation arising from the entropy-regularized exploratory control problem, which was formulated by Wang, Zariphopoulou and Zhou (J. Mach. Learn. Res., 21, 2020) in the context of reinforcement learning in continuous time and space. We establish the well-posedness and regularity of the viscosity solution to the equation, as well as the convergence of the exploratory control problem to the classical stochastic control problem when the level of exploration decays to zero. We then apply the general results to the exploratory temperature control problem, which was introduced by Gao, Xu and Zhou (arXiv:2005.04057, 2020) to design an endogenous temperature schedule for simulated annealing (SA) in the context of non-convex optimization. We derive an explicit rate of convergence for this problem as exploration diminishes to zero, and find that the steady state of the optimally controlled process exists, which is however neither a Dirac mass on the global optimum nor a Gibbs measure.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
Who Are I: Time Inconsistency and Intrapersonal Conflict and Reconciliation
Authors:
Xue Dong He,
Xun Yu Zhou
Abstract:
Time inconsistency is prevalent in dynamic choice problems: a plan of actions to be taken in the future that is optimal for an agent today may not be optimal for the same agent in the future. If the agent is aware of this intra-personal conflict but unable to commit herself in the future to following the optimal plan today, the rational strategy for her today is to reconcile with her future selves…
▽ More
Time inconsistency is prevalent in dynamic choice problems: a plan of actions to be taken in the future that is optimal for an agent today may not be optimal for the same agent in the future. If the agent is aware of this intra-personal conflict but unable to commit herself in the future to following the optimal plan today, the rational strategy for her today is to reconcile with her future selves, namely to correctly anticipate her actions in the future and then act today accordingly. Such a strategy is named intra-personal equilibrium and has been studied since as early as in the 1950s. A rigorous treatment in continuous-time settings, however, had not been available until a decade ago. Since then, the study on intra-personal equilibrium for time-inconsistent problems in continuous time has grown rapidly. In this chapter, we review the classical results and some recent development in this literature.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Simulated annealing from continuum to discretization: a convergence analysis via the Eyring--Kramers law
Authors:
Wenpin Tang,
Xun Yu Zhou
Abstract:
We study the convergence rate of continuous-time simulated annealing $(X_t; \, t \ge 0)$ and its discretization $(x_k; \, k =0,1, \ldots)$ for approximating the global optimum of a given function $f$. We prove that the tail probability $\mathbb{P}(f(X_t) > \min f +δ)$ (resp. $\mathbb{P}(f(x_k) > \min f +δ)$) decays polynomial in time (resp. in cumulative step size), and provide an explicit rate as…
▽ More
We study the convergence rate of continuous-time simulated annealing $(X_t; \, t \ge 0)$ and its discretization $(x_k; \, k =0,1, \ldots)$ for approximating the global optimum of a given function $f$. We prove that the tail probability $\mathbb{P}(f(X_t) > \min f +δ)$ (resp. $\mathbb{P}(f(x_k) > \min f +δ)$) decays polynomial in time (resp. in cumulative step size), and provide an explicit rate as a function of the model parameters. Our argument applies the recent development on functional inequalities for the Gibbs measure at low temperatures -- the Eyring-Kramers law. In the discrete setting, we obtain a condition on the step size to ensure the convergence.
△ Less
Submitted 9 February, 2021; v1 submitted 3 February, 2021;
originally announced February 2021.
-
State-Dependent Temperature Control for Langevin Diffusions
Authors:
Xuefeng Gao,
Zuo Quan Xu,
Xun Yu Zhou
Abstract:
We study the temperature control problem for Langevin diffusions in the context of non-convex optimization. The classical optimal control of such a problem is of the bang-bang type, which is overly sensitive to errors. A remedy is to allow the diffusions to explore other temperature values and hence smooth out the bang-bang control. We accomplish this by a stochastic relaxed control formulation in…
▽ More
We study the temperature control problem for Langevin diffusions in the context of non-convex optimization. The classical optimal control of such a problem is of the bang-bang type, which is overly sensitive to errors. A remedy is to allow the diffusions to explore other temperature values and hence smooth out the bang-bang control. We accomplish this by a stochastic relaxed control formulation incorporating randomization of the temperature control and regularizing its entropy. We derive a state-dependent, truncated exponential distribution, which can be used to sample temperatures in a Langevin algorithm, in terms of the solution to an HJB partial differential equation. We carry out a numerical experiment on a one-dimensional baseline example, in which the HJB equation can be easily solved, to compare the performance of the algorithm with three other available algorithms in search of a global optimum.
△ Less
Submitted 17 December, 2021; v1 submitted 15 November, 2020;
originally announced November 2020.
-
Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework
Authors:
Haoran Wang,
Xun Yu Zhou
Abstract:
We approach the continuous-time mean-variance (MV) portfolio selection with reinforcement learning (RL). The problem is to achieve the best tradeoff between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then establish connecti…
▽ More
We approach the continuous-time mean-variance (MV) portfolio selection with reinforcement learning (RL). The problem is to achieve the best tradeoff between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then establish connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero. Finally, we prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm outperforms both an adaptive control based method and a deep neural networks based algorithm by a large margin in our simulations.
△ Less
Submitted 4 May, 2019; v1 submitted 25 April, 2019;
originally announced April 2019.
-
Time-Inconsistent Stochastic Linear--Quadratic Control: Characterization and Uniqueness of Equilibrium
Authors:
Ying Hu,
Hanqing Jin,
Xun Yu Zhou
Abstract:
In this paper, we continue our study on a general time-inconsistent stochastic linear--quadratic (LQ) control problem originally formulated in [6]. We derive a necessary and sufficient condition for equilibrium controls via a flow of forward--backward stochastic differential equations. When the state is one dimensional and the coefficients in the problem are all deterministic, we prove that the ex…
▽ More
In this paper, we continue our study on a general time-inconsistent stochastic linear--quadratic (LQ) control problem originally formulated in [6]. We derive a necessary and sufficient condition for equilibrium controls via a flow of forward--backward stochastic differential equations. When the state is one dimensional and the coefficients in the problem are all deterministic, we prove that the explicit equilibrium control constructed in \cite{HJZ} is indeed unique. Our proof is based on the derived equivalent condition for equilibria as well as a stochastic version of the Lebesgue differentiation theorem. Finally, we show that the equilibrium strategy is unique for a mean--variance portfolio selection model in a complete financial market where the risk-free rate is a deterministic function of time but all the other market parameters are possibly stochastic processes.
△ Less
Submitted 26 May, 2015; v1 submitted 5 April, 2015;
originally announced April 2015.
-
A Note on Indefinite Stochastic Riccati Equations
Authors:
Zhongmin Qian,
Xun Yu Zhou
Abstract:
An indefinite stochastic Riccati Equation is a matrix-valued, highly nonlinear backward stochastic differential equation together with an algebraic, matrix positive definiteness constraint. We introduce a new approach to solve a class of such equations (including the existence of solutions) driven by one-dimensional Brownian motion. The idea is to replace the original equation by a system of BSDEs…
▽ More
An indefinite stochastic Riccati Equation is a matrix-valued, highly nonlinear backward stochastic differential equation together with an algebraic, matrix positive definiteness constraint. We introduce a new approach to solve a class of such equations (including the existence of solutions) driven by one-dimensional Brownian motion. The idea is to replace the original equation by a system of BSDEs (without involving any algebraic constraint) whose existence of solutions automatically enforces the original algebraic constraint to be satisfied.
△ Less
Submitted 17 March, 2012;
originally announced March 2012.
-
Time-Inconsistent Stochastic Linear--Quadratic Control
Authors:
Ying Hu,
Hanqing Jin,
Xun Yu Zhou
Abstract:
In this paper, we formulate a general time-inconsistent stochastic linear--quadratic (LQ) control problem. The time-inconsistency arises from the presence of a quadratic term of the expected state as well as a state-dependent term in the objective functional. We define an equilibrium, instead of optimal, solution within the class of open-loop controls, and derive a sufficient condition for equilib…
▽ More
In this paper, we formulate a general time-inconsistent stochastic linear--quadratic (LQ) control problem. The time-inconsistency arises from the presence of a quadratic term of the expected state as well as a state-dependent term in the objective functional. We define an equilibrium, instead of optimal, solution within the class of open-loop controls, and derive a sufficient condition for equilibrium controls via a flow of forward--backward stochastic differential equations. When the state is one dimensional and the coefficients in the problem are all deterministic, we find an explicit equilibrium control. As an application, we then consider a mean-variance portfolio selection model in a complete financial market where the risk-free rate is a deterministic function of time but all the other market parameters are possibly stochastic processes. Applying the general sufficient condition, we obtain explicit equilibrium strategies when the risk premium is both deterministic and stochastic.
△ Less
Submitted 3 November, 2011;
originally announced November 2011.
-
Optimal stopping under probability distortion
Authors:
Zuo Quan Xu,
Xun Yu Zhou
Abstract:
We formulate an optimal stopping problem for a geometric Brownian motion where the probability scale is distorted by a general nonlinear function. The problem is inherently time inconsistent due to the Choquet integration involved. We develop a new approach, based on a reformulation of the problem where one optimally chooses the probability distribution or quantile function of the stopped state. A…
▽ More
We formulate an optimal stopping problem for a geometric Brownian motion where the probability scale is distorted by a general nonlinear function. The problem is inherently time inconsistent due to the Choquet integration involved. We develop a new approach, based on a reformulation of the problem where one optimally chooses the probability distribution or quantile function of the stopped state. An optimal stopping time can then be recovered from the obtained distribution/quantile function, either in a straightforward way for several important cases or in general via the Skorokhod embedding. This approach enables us to solve the problem in a fairly general manner with different shapes of the payoff and probability distortion functions. We also discuss economical interpretations of the results. In particular, we justify several liquidation strategies widely adopted in stock trading, including those of "buy and hold", "cut loss or take profit", "cut loss and let profit run" and "sell on a percentage of historical high".
△ Less
Submitted 12 February, 2013; v1 submitted 9 March, 2011;
originally announced March 2011.
-
A Convex Stochastic Optimization Problem Arising from Portfolio Selection
Authors:
Hanqing Jin,
Zuo Quan Xu,
Xun Yu Zhou
Abstract:
A continuous-time financial portfolio selection model with expected utility maximization typically boils down to solving a (static) convex stochastic optimization problem in terms of the terminal wealth, with a budget constraint. In literature the latter is solved by assuming {\it a priori} that the problem is well-posed (i.e., the supremum value is finite) and a Lagrange multiplier exists (and…
▽ More
A continuous-time financial portfolio selection model with expected utility maximization typically boils down to solving a (static) convex stochastic optimization problem in terms of the terminal wealth, with a budget constraint. In literature the latter is solved by assuming {\it a priori} that the problem is well-posed (i.e., the supremum value is finite) and a Lagrange multiplier exists (and as a consequence the optimal solution is attainable). In this paper it is first shown, via various counter-examples, neither of these two assumptions needs to hold, and an optimal solution does not necessarily exist. These anomalies in turn have important interpretations in and impacts on the portfolio selection modeling and solutions. Relations among the non-existence of the Lagrange multiplier, the ill-posedness of the problem, and the non-attainability of an optimal solution are then investigated. Finally, explicit and easily verifiable conditions are derived which lead to finding the unique optimal solution.
△ Less
Submitted 27 September, 2007;
originally announced September 2007.
-
Continuous-time mean-variance efficiency: the 80% rule
Authors:
Xun Li,
Xun Yu Zhou
Abstract:
This paper studies a continuous-time market where an agent, having specified an investment horizon and a targeted terminal mean return, seeks to minimize the variance of the return. The optimal portfolio of such a problem is called mean-variance efficient à la Markowitz. It is shown that, when the market coefficients are deterministic functions of time, a mean-variance efficient portfolio realiz…
▽ More
This paper studies a continuous-time market where an agent, having specified an investment horizon and a targeted terminal mean return, seeks to minimize the variance of the return. The optimal portfolio of such a problem is called mean-variance efficient à la Markowitz. It is shown that, when the market coefficients are deterministic functions of time, a mean-variance efficient portfolio realizes the (discounted) targeted return on or before the terminal date with a probability greater than 0.8072. This number is universal irrespective of the market parameters, the targeted return and the length of the investment horizon.
△ Less
Submitted 9 February, 2007;
originally announced February 2007.
-
Interplay between dividend rate and business constraints for a financial corporation
Authors:
Tahir Choulli,
Michael Taksar,
Xun Yu Zhou
Abstract:
We study a model of a corporation which has the possibility to choose various production/business policies with different expected profits and risks. In the model there are restrictions on the dividend distribution rates as well as restrictions on the risk the company can undertake. The objective is to maximize the expected present value of the total dividend distributions. We outline the corres…
▽ More
We study a model of a corporation which has the possibility to choose various production/business policies with different expected profits and risks. In the model there are restrictions on the dividend distribution rates as well as restrictions on the risk the company can undertake. The objective is to maximize the expected present value of the total dividend distributions. We outline the corresponding Hamilton-Jacobi-Bellman equation, compute explicitly the optimal return function and determine the optimal policy. As a consequence of these results, the way the dividend rate and business constraints affect the optimal policy is revealed. In particular, we show that under certain relationships between the constraints and the exogenous parameters of the random processes that govern the returns, some business activities might be redundant, that is, under the optimal policy they will never be used in any scenario.
△ Less
Submitted 24 March, 2005;
originally announced March 2005.