Skip to main content

Showing 1–24 of 24 results for author: Zhou, X Y

Searching in archive math. Search in all archives.
.
  1. arXiv:2507.00358  [pdf, ps, other

    cs.LG cs.AI eess.SY math.OC

    Data-Driven Exploration for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems

    Authors: Yilie Huang, Xun Yu Zhou

    Abstract: We study reinforcement learning (RL) for the same class of continuous-time stochastic linear--quadratic (LQ) control problems as in \cite{huang2024sublinear}, where volatilities depend on both states and controls while states are scalar-valued and running control rewards are absent. We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the cri… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: 36 pages, 10 figures

  2. arXiv:2412.16175  [pdf, other

    q-fin.PM cs.LG eess.SY math.OC

    Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study

    Authors: Yilie Huang, Yanwei Jia, Xun Yu Zhou

    Abstract: We study continuous-time mean--variance portfolio selection in markets where stock prices are diffusion processes driven by observable factors that are also diffusion processes yet the coefficients of these processes are unknown. Based on the recently developed reinforcement learning (RL) theory for diffusion processes, we present a general data-driven RL algorithm that learns the pre-committed in… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

    Comments: 76 pages, 5 figures, 7 tables

    MSC Class: 68T05; 91G10; 68Q25; 93E35; 93E20

  3. arXiv:2411.01302  [pdf, other

    cs.LG math.OC math.PR

    Regret of exploratory policy improvement and $q$-learning

    Authors: Wenpin Tang, Xun Yu Zhou

    Abstract: We study the convergence of $q$-learning and related algorithms introduced by Jia and Zhou (J. Mach. Learn. Res., 24 (2023), 161) for controlled diffusion processes. Under suitable conditions on the growth and regularity of the model parameters, we provide a quantitative error and regret analysis of both the exploratory policy improvement algorithm and the $q$-learning algorithm.

    Submitted 2 November, 2024; originally announced November 2024.

    Comments: 23 pages, 1 figure

  4. arXiv:2409.04832  [pdf, other

    cs.LG cs.AI math.OC

    Reward-Directed Score-Based Diffusion Models via q-Learning

    Authors: Xuefeng Gao, Jiale Zha, Xun Yu Zhou

    Abstract: We propose a new reinforcement learning (RL) formulation for training continuous-time score-based diffusion models for generative AI to generate samples that maximize reward functions while keeping the generated distributions close to the unknown target data distributions. Different from most existing studies, our formulation does not involve any pretrained model for the unknown score functions of… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  5. arXiv:2408.09242  [pdf, other

    math.OC q-fin.MF q-fin.PR

    Learning to Optimally Stop Diffusion Processes, with Financial Applications

    Authors: Min Dai, Yu Sun, Zuo Quan Xu, Xun Yu Zhou

    Abstract: We study optimal stopping for diffusion processes with unknown model primitives within the continuous-time reinforcement learning (RL) framework developed by Wang et al. (2020), and present applications to option pricing and portfolio choice. By penalizing the corresponding variational inequality formulation, we transform the stopping problem into a stochastic optimal control problem with two acti… ▽ More

    Submitted 8 September, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

    Comments: 35 pages, 9 figures

  6. arXiv:2407.17226  [pdf, ps, other

    cs.LG cs.AI eess.SY math.OC

    Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

    Authors: Yilie Huang, Yanwei Jia, Xun Yu Zhou

    Abstract: We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an… ▽ More

    Submitted 2 July, 2025; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: 42 pages, 4 figures. Accepted for publication in SIAM Journal on Control and Optimization (2025)

  7. arXiv:2405.16449  [pdf, other

    cs.LG math.OC q-fin.MF

    Reinforcement Learning for Jump-Diffusions, with Financial Applications

    Authors: Xuefeng Gao, Lingfei Li, Xun Yu Zhou

    Abstract: We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the explora… ▽ More

    Submitted 7 January, 2025; v1 submitted 26 May, 2024; originally announced May 2024.

  8. arXiv:2304.06938  [pdf, ps, other

    q-fin.MF math.OC q-fin.PM q-fin.RM

    Robust utility maximization with intractable claims

    Authors: Yunhong Li, Zuo Quan Xu, Xun Yu Zhou

    Abstract: We study a continuous-time expected utility maximization problem in which the investor at maturity receives the value of a contingent claim in addition to the investment payoff from the financial market. The investor knows nothing about the claim other than its probability distribution, hence an ``intractable claim''. In view of the lack of necessary information about the claim, we consider a robu… ▽ More

    Submitted 14 July, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

    MSC Class: 91B28; 91G10; 35Q91

  9. arXiv:2212.07944  [pdf, other

    cs.LG math.OC q-fin.CP q-fin.PM q-fin.ST

    Variable Clustering via Distributionally Robust Nodewise Regression

    Authors: Kaizheng Wang, Xiao Xu, Xun Yu Zhou

    Abstract: We study a multi-factor block model for variable clustering and connect it to the regularized subspace clustering by formulating a distributionally robust version of the nodewise regression. To solve the latter problem, we derive a convex relaxation, provide guidance on selecting the size of the robust region, and hence the regularization weighting parameter, based on the data, and propose an ADMM… ▽ More

    Submitted 20 December, 2022; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: 34 pages

  10. arXiv:2210.00832  [pdf, other

    cs.LG math.OC

    Square-root regret bounds for continuous-time episodic Markov decision processes

    Authors: Xuefeng Gao, Xun Yu Zhou

    Abstract: We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-horizon episodic setting. In contrast to discrete-time MDPs, the inter-transition times of a continuous-time MDP are exponentially distributed with rate parameters depending on the state--action pair at each transition. We present a learning algorithm based on the methods of value iteration and upper… ▽ More

    Submitted 2 October, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

  11. arXiv:2208.06535  [pdf, ps, other

    math.PR math.OC q-fin.MF q-fin.RM

    $g$-Expectation of Distributions

    Authors: Mingyu Xu, Zuo Quan Xu, Xun Yu Zhou

    Abstract: We define $g$-expectation of a distribution as the infimum of the $g$-expectations of all the terminal random variables sharing that distribution. We present two special cases for nonlinear $g$ where the $g$-expectation of distributions can be explicitly derived. As a related problem, we introduce the notion of law-invariant $g$-expectation and provide its sufficient conditions. Examples of applic… ▽ More

    Submitted 12 August, 2022; originally announced August 2022.

  12. arXiv:2205.11168  [pdf, ps, other

    cs.LG math.OC stat.ML

    Logarithmic regret bounds for continuous-time average-reward Markov decision processes

    Authors: Xuefeng Gao, Xun Yu Zhou

    Abstract: We consider reinforcement learning for continuous-time Markov decision processes (MDPs) in the infinite-horizon, average-reward setting. In contrast to discrete-time MDPs, a continuous-time process moves to a state and stays there for a random holding time after an action is taken. With unknown transition probabilities and rates of exponential holding times, we derive instance-dependent regret low… ▽ More

    Submitted 2 July, 2024; v1 submitted 23 May, 2022; originally announced May 2022.

  13. arXiv:2109.10269  [pdf, ps, other

    math.OC math.AP math.PR

    Exploratory HJB equations and their convergence

    Authors: Wenpin Tang, Paul Yuming Zhang, Xun Yu Zhou

    Abstract: We study the exploratory Hamilton--Jacobi--Bellman (HJB) equation arising from the entropy-regularized exploratory control problem, which was formulated by Wang, Zariphopoulou and Zhou (J. Mach. Learn. Res., 21, 2020) in the context of reinforcement learning in continuous time and space. We establish the well-posedness and regularity of the viscosity solution to the equation, as well as the conver… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: 31 pages

    MSC Class: 35F21; 60J60; 93E15; 93E20

  14. arXiv:2105.01829  [pdf, ps, other

    math.OC q-fin.MF

    Who Are I: Time Inconsistency and Intrapersonal Conflict and Reconciliation

    Authors: Xue Dong He, Xun Yu Zhou

    Abstract: Time inconsistency is prevalent in dynamic choice problems: a plan of actions to be taken in the future that is optimal for an agent today may not be optimal for the same agent in the future. If the agent is aware of this intra-personal conflict but unable to commit herself in the future to following the optimal plan today, the rational strategy for her today is to reconcile with her future selves… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

  15. arXiv:2102.02339  [pdf, other

    math.PR math.ST stat.ML

    Simulated annealing from continuum to discretization: a convergence analysis via the Eyring--Kramers law

    Authors: Wenpin Tang, Xun Yu Zhou

    Abstract: We study the convergence rate of continuous-time simulated annealing $(X_t; \, t \ge 0)$ and its discretization $(x_k; \, k =0,1, \ldots)$ for approximating the global optimum of a given function $f$. We prove that the tail probability $\mathbb{P}(f(X_t) > \min f +δ)$ (resp. $\mathbb{P}(f(x_k) > \min f +δ)$) decays polynomial in time (resp. in cumulative step size), and provide an explicit rate as… ▽ More

    Submitted 9 February, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: 19 pages, 1 figure

  16. arXiv:2011.07456  [pdf, other

    math.OC cs.LG

    State-Dependent Temperature Control for Langevin Diffusions

    Authors: Xuefeng Gao, Zuo Quan Xu, Xun Yu Zhou

    Abstract: We study the temperature control problem for Langevin diffusions in the context of non-convex optimization. The classical optimal control of such a problem is of the bang-bang type, which is overly sensitive to errors. A remedy is to allow the diffusions to explore other temperature values and hence smooth out the bang-bang control. We accomplish this by a stochastic relaxed control formulation in… ▽ More

    Submitted 17 December, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

  17. arXiv:1904.11392  [pdf, ps, other

    q-fin.PM cs.CE cs.LG math.OC

    Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

    Authors: Haoran Wang, Xun Yu Zhou

    Abstract: We approach the continuous-time mean-variance (MV) portfolio selection with reinforcement learning (RL). The problem is to achieve the best tradeoff between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then establish connecti… ▽ More

    Submitted 4 May, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

    Comments: 39 pages, 5 figures

    MSC Class: 91G10

  18. arXiv:1504.01152  [pdf, ps, other

    q-fin.PM math.PR

    Time-Inconsistent Stochastic Linear--Quadratic Control: Characterization and Uniqueness of Equilibrium

    Authors: Ying Hu, Hanqing Jin, Xun Yu Zhou

    Abstract: In this paper, we continue our study on a general time-inconsistent stochastic linear--quadratic (LQ) control problem originally formulated in [6]. We derive a necessary and sufficient condition for equilibrium controls via a flow of forward--backward stochastic differential equations. When the state is one dimensional and the coefficients in the problem are all deterministic, we prove that the ex… ▽ More

    Submitted 26 May, 2015; v1 submitted 5 April, 2015; originally announced April 2015.

  19. arXiv:1203.3857  [pdf, ps, other

    math.PR

    A Note on Indefinite Stochastic Riccati Equations

    Authors: Zhongmin Qian, Xun Yu Zhou

    Abstract: An indefinite stochastic Riccati Equation is a matrix-valued, highly nonlinear backward stochastic differential equation together with an algebraic, matrix positive definiteness constraint. We introduce a new approach to solve a class of such equations (including the existence of solutions) driven by one-dimensional Brownian motion. The idea is to replace the original equation by a system of BSDEs… ▽ More

    Submitted 17 March, 2012; originally announced March 2012.

  20. arXiv:1111.0818  [pdf, ps, other

    math.OC math.DS math.PR q-fin.PM

    Time-Inconsistent Stochastic Linear--Quadratic Control

    Authors: Ying Hu, Hanqing Jin, Xun Yu Zhou

    Abstract: In this paper, we formulate a general time-inconsistent stochastic linear--quadratic (LQ) control problem. The time-inconsistency arises from the presence of a quadratic term of the expected state as well as a state-dependent term in the objective functional. We define an equilibrium, instead of optimal, solution within the class of open-loop controls, and derive a sufficient condition for equilib… ▽ More

    Submitted 3 November, 2011; originally announced November 2011.

    Comments: 24 pages. To be submitted to SICON

    MSC Class: 93E99; 60H10; 91B28

  21. arXiv:1103.1755  [pdf, ps, other

    math.PR math.OC q-fin.PM

    Optimal stopping under probability distortion

    Authors: Zuo Quan Xu, Xun Yu Zhou

    Abstract: We formulate an optimal stopping problem for a geometric Brownian motion where the probability scale is distorted by a general nonlinear function. The problem is inherently time inconsistent due to the Choquet integration involved. We develop a new approach, based on a reformulation of the problem where one optimally chooses the probability distribution or quantile function of the stopped state. A… ▽ More

    Submitted 12 February, 2013; v1 submitted 9 March, 2011; originally announced March 2011.

    Comments: Published in at http://dx.doi.org/10.1214/11-AAP838 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Journal ref: Annals of Applied Probability 2013, Vol. 23, No. 1, 251-282

  22. arXiv:0709.4467  [pdf, ps, other

    q-fin.PM math.NA math.OC math.PR

    A Convex Stochastic Optimization Problem Arising from Portfolio Selection

    Authors: Hanqing Jin, Zuo Quan Xu, Xun Yu Zhou

    Abstract: A continuous-time financial portfolio selection model with expected utility maximization typically boils down to solving a (static) convex stochastic optimization problem in terms of the terminal wealth, with a budget constraint. In literature the latter is solved by assuming {\it a priori} that the problem is well-posed (i.e., the supremum value is finite) and a Lagrange multiplier exists (and… ▽ More

    Submitted 27 September, 2007; originally announced September 2007.

    Comments: 15 pages

    MSC Class: 49K20

    Journal ref: Mathematical Finance, Vol. 18, No. 1 (January 2008), 171-183

  23. Continuous-time mean-variance efficiency: the 80% rule

    Authors: Xun Li, Xun Yu Zhou

    Abstract: This paper studies a continuous-time market where an agent, having specified an investment horizon and a targeted terminal mean return, seeks to minimize the variance of the return. The optimal portfolio of such a problem is called mean-variance efficient à la Markowitz. It is shown that, when the market coefficients are deterministic functions of time, a mean-variance efficient portfolio realiz… ▽ More

    Submitted 9 February, 2007; originally announced February 2007.

    Comments: Published at http://dx.doi.org/10.1214/105051606000000349 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AAP-AAP0188 MSC Class: 90A09 (Primary) 93E20 (Secondary)

    Journal ref: Annals of Applied Probability 2006, Vol. 16, No. 4, 1751-1763

  24. Interplay between dividend rate and business constraints for a financial corporation

    Authors: Tahir Choulli, Michael Taksar, Xun Yu Zhou

    Abstract: We study a model of a corporation which has the possibility to choose various production/business policies with different expected profits and risks. In the model there are restrictions on the dividend distribution rates as well as restrictions on the risk the company can undertake. The objective is to maximize the expected present value of the total dividend distributions. We outline the corres… ▽ More

    Submitted 24 March, 2005; originally announced March 2005.

    Comments: Published at http://dx.doi.org/10.1214/105051604000000909 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AAP-AAP024 MSC Class: 91B70; 93E20. (Primary)

    Journal ref: Annals of Applied Probability 2004, Vol. 14, No. 4, 1810-1837