-
The Law of Large Numbers and CLT for Non-stationary Markov Jump Processes Exhibiting Time-of-Day Effects
Authors:
Monte Fischer,
Peter W. Glynn
Abstract:
In this paper, we develop a general law of large numbers and central limit theorem for finite state Markov jump processes with non-stationary transition rates. Such models commonly arise in service operations and manufacturing applications in which time-of-day, day-of-week, and secular effects are of first order importance in predicting system behavior. Our theorems allow for non-stationary reward…
▽ More
In this paper, we develop a general law of large numbers and central limit theorem for finite state Markov jump processes with non-stationary transition rates. Such models commonly arise in service operations and manufacturing applications in which time-of-day, day-of-week, and secular effects are of first order importance in predicting system behavior. Our theorems allow for non-stationary reward environments that continuously accumulate reward, while including contributions from non-stationary lump-sum rewards of random size that are collected at jump times of the underlying process, jump times of a Poisson process modulated by the underlying process, or scheduled deterministic times. As part of our development, we also obtain a new central limit theorem for the special case in which the jump process and reward structure is periodic (as may occur over a weekly time interval), as well as for jump process models with resetting.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Approximation of Markov Chain Expectations and the Key Role of Stationary Distribution Convergence
Authors:
Peter W. Glynn,
Zeyu Zheng
Abstract:
Consider a sequence $P_n$ of positive recurrent transition matrices or kernels that approximate a limiting infinite state matrix or kernel $P_{\infty}$. Such approximations arise naturally when one truncates an infinite state Markov chain and replaces it with a finite state approximation. It also describes the situation in which $P_{\infty}$ is a simplified limiting approximation to $P_n$ when…
▽ More
Consider a sequence $P_n$ of positive recurrent transition matrices or kernels that approximate a limiting infinite state matrix or kernel $P_{\infty}$. Such approximations arise naturally when one truncates an infinite state Markov chain and replaces it with a finite state approximation. It also describes the situation in which $P_{\infty}$ is a simplified limiting approximation to $P_n$ when $n$ is large. In both settings, it is often verified that the approximation $P_n$ has the characteristic that its stationary distribution $π_n$ converges to the stationary distribution $π_{\infty}$ associated with the limit. In this paper, we show that when the state space is countably infinite, this stationary distribution convergence implies that $P_n^m$ can be approximated uniformly in $m$ by $P_{\infty}^m$ when n is large. We show that this ability to approximate the marginal distributions at all time scales $m$ fails in continuous state space, but is valid when the convergence is in total variation or when we have weak convergence and the kernels are suitably Lipschitz. When the state space is discrete (as in the truncation setting), we further show that stationary distribution convergence also implies that all the expectations that are computable via first transition analysis (e.g. mean hitting times, expected infinite horizon discounted rewards) converge to those associated with the limit $P_{\infty}$. Simply put, we show that once one has established stationary distribution convergence, one immediately can infer convergence for a huge range of other expectations.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
A Regeneration-based a Posteriori Error Bound for a Markov Chain Stationary Distribution Truncation Algorithm
Authors:
Peter W. Glynn,
Zeyu Zheng
Abstract:
When the state space of a discrete state space positive recurrent Markov chain is infinite or very large, it becomes necessary to truncate the state space in order to facilitate numerical computation of the stationary distribution. This paper develops a new approach for bounding the truncation error that arises when computing approximations to the stationary distribution. This rigorous a posterior…
▽ More
When the state space of a discrete state space positive recurrent Markov chain is infinite or very large, it becomes necessary to truncate the state space in order to facilitate numerical computation of the stationary distribution. This paper develops a new approach for bounding the truncation error that arises when computing approximations to the stationary distribution. This rigorous a posteriori error bound exploits the regenerative structure of the chain and assumes knowledge of a Lyapunov function. Because the bound is a posteriori (and leverages the computations done to calculate the stationary distribution itself), it tends to be much tighter than a priori bounds. The bound decomposes the regenerative cycle into a random number of excursions from a set $K$ defined in terms of the Lyapunov function into the complement of the truncation set $A$. The bound can be easily computed, and does not (for example) involve a linear program, as do some other error bounds.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Computable Bounds on the Solution to Poisson's Equation for General Harris Chains
Authors:
Peter W. Glynn,
Na Lin,
Yuanyuan Liu
Abstract:
Poisson's equation is fundamental to the study of Markov chains, and arises in connection with martingale representations and central limit theorems for additive functionals, perturbation theory for stationary distributions, and average reward Markov decision process problems. In this paper, we develop a new probabilistic representation for the solution of Poisson's equation, and use Lyapunov func…
▽ More
Poisson's equation is fundamental to the study of Markov chains, and arises in connection with martingale representations and central limit theorems for additive functionals, perturbation theory for stationary distributions, and average reward Markov decision process problems. In this paper, we develop a new probabilistic representation for the solution of Poisson's equation, and use Lyapunov functions to bound this solution representation explicitly. In contrast to most prior work on this problem, our bounds are computable. Our contribution is closely connected to recent work of Herve and Ledoux (2025), in which they focus their study on a special class of Harris chains satisfying a particular small set condition. However, our theory covers general Harris chains, and often provides a tighter bound. In addition to the new bound and representation, we also develop a computable uniform bound on marginal expectations for Harris chains, and a computable bound on the potential kernel representation of the solution to Poisson's equation.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Asymptotic Product-form Steady-state Distribution for Semimartingale Reflecting Brownian Motion in Multi-scaling Regime
Authors:
Jin Guang,
Xinyun Chen,
J. G. Dai,
Peter W. Glynn
Abstract:
Inspired by Dai et al. [2023], we develop a novel multi-scaling asymptotic regime for semimartingale reflecting Brownian motion (SRBM). In this regime, we establish the steady-state convergence of SRBM to a product-form limit with exponentially distributed components by assuming the P-reflection matrix and a uniform moment bound condition. We further demonstrate that the uniform moment bound condi…
▽ More
Inspired by Dai et al. [2023], we develop a novel multi-scaling asymptotic regime for semimartingale reflecting Brownian motion (SRBM). In this regime, we establish the steady-state convergence of SRBM to a product-form limit with exponentially distributed components by assuming the P-reflection matrix and a uniform moment bound condition. We further demonstrate that the uniform moment bound condition holds in several subclasses of P-matrices. Our proof approach is rooted in the basic adjoint relationship (BAR) for SRBM proposed by Harrison and Williams [1987a].
△ Less
Submitted 12 June, 2025; v1 submitted 25 March, 2025;
originally announced March 2025.
-
Linear Algebraic Truncation Algorithm with A Posteriori Error Bounds for Computing Markov Chain Equilibrium Gradients
Authors:
Saied Mahdian,
Peter W. Glynn
Abstract:
The numerical computation of equilibrium reward gradients for Markov chains appears in many applications for example within the policy improvement step arising in connection with average reward stochastic dynamic programming. When the state space is large or infinite, one will typically need to truncate the state space in order to arrive at a numerically tractable formulation. In this paper, we de…
▽ More
The numerical computation of equilibrium reward gradients for Markov chains appears in many applications for example within the policy improvement step arising in connection with average reward stochastic dynamic programming. When the state space is large or infinite, one will typically need to truncate the state space in order to arrive at a numerically tractable formulation. In this paper, we derive the first computable a posteriori error bounds for equilibrium reward gradients that account for the error induced by the truncation. Our approach uses regeneration to express equilibrium quantities in terms of the expectations of cumulative rewards over regenerative cycles. Lyapunov functions are then used to bound the contributions to these cumulative rewards and their gradients from path excursions that take the chain outside the truncation set. Our numerical results indicate that our approach can provide highly accurate bounds with truncation sets of moderate size. We further extend our approach to Markov jump processes.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
Small Sample Behavior of Wasserstein Projections, Connections to Empirical Likelihood, and Other Applications
Authors:
Sirui Lin,
Jose Blanchet,
Peter Glynn,
Viet Anh Nguyen
Abstract:
The empirical Wasserstein projection (WP) distance quantifies the Wasserstein distance from the empirical distribution to a set of probability measures satisfying given expectation constraints. The WP is a powerful tool because it mitigates the curse of dimensionality inherent in the Wasserstein distance, making it valuable for various tasks, including constructing statistics for hypothesis testin…
▽ More
The empirical Wasserstein projection (WP) distance quantifies the Wasserstein distance from the empirical distribution to a set of probability measures satisfying given expectation constraints. The WP is a powerful tool because it mitigates the curse of dimensionality inherent in the Wasserstein distance, making it valuable for various tasks, including constructing statistics for hypothesis testing, optimally selecting the ambiguity size in Wasserstein distributionally robust optimization, and studying algorithmic fairness. While the weak convergence analysis of the WP as the sample size $n$ grows is well understood, higher-order (i.e., sharp) asymptotics of WP remain unknown. In this paper, we study the second-order asymptotic expansion and the Edgeworth expansion of WP, both expressed as power series of $n^{-1/2}$. These expansions are essential to develop improved confidence level accuracy and a power expansion analysis for the WP-based tests for moment equations null against local alternative hypotheses. As a by-product, we obtain insightful criteria for comparing the power of the Empirical Likelihood and Hotelling's $T^2$ tests against the WP-based test. This insight provides the first comprehensive guideline for selecting the most powerful local test among WP-based, empirical-likelihood-based, and Hotelling's $T^2$ tests for a null. Furthermore, we introduce Bartlett-type corrections to improve the approximation to WP distance quantiles and, thus, improve the coverage in WP applications.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Online Linear Programming with Batching
Authors:
Haoran Xu,
Peter W. Glynn,
Yinyu Ye
Abstract:
We study Online Linear Programming (OLP) with batching. The planning horizon is cut into $K$ batches, and the decisions on customers arriving within a batch can be delayed to the end of their associated batch. Compared with OLP without batching, the ability to delay decisions brings better operational performance, as measured by regret. Two research questions of interest are: (1) What is a lower b…
▽ More
We study Online Linear Programming (OLP) with batching. The planning horizon is cut into $K$ batches, and the decisions on customers arriving within a batch can be delayed to the end of their associated batch. Compared with OLP without batching, the ability to delay decisions brings better operational performance, as measured by regret. Two research questions of interest are: (1) What is a lower bound of the regret as a function of $K$? (2) What algorithms can achieve the regret lower bound? These questions have been analyzed in the literature when the distribution of the reward and the resource consumption of the customers have finite support. By contrast, this paper analyzes these questions when the conditional distribution of the reward given the resource consumption is continuous, and we show the answers are different under this setting. When there is only a single type of resource and the decision maker knows the total number of customers, we propose an algorithm with a $O(\log K)$ regret upper bound and provide a $Ω(\log K)$ regret lower bound. We also propose algorithms with $O(\log K)$ regret upper bound for the setting in which there are multiple types of resource and the setting in which customers arrive following a Poisson process. All these regret upper and lower bounds are independent of the length of the planning horizon, and all the proposed algorithms delay decisions on customers arriving in only the first and the last batch. We also take customer impatience into consideration and establish a way of selecting an appropriate batch size.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
An Efficient High-Dimensional Gradient Estimator for Stochastic Differential Equations
Authors:
Shengbo Wang,
Jose Blanchet,
Peter Glynn
Abstract:
Overparameterized stochastic differential equation (SDE) models have achieved remarkable success in various complex environments, such as PDE-constrained optimization, stochastic control and reinforcement learning, financial engineering, and neural SDEs. These models often feature system evolution coefficients that are parameterized by a high-dimensional vector $θ\in \mathbb{R}^n$, aiming to optim…
▽ More
Overparameterized stochastic differential equation (SDE) models have achieved remarkable success in various complex environments, such as PDE-constrained optimization, stochastic control and reinforcement learning, financial engineering, and neural SDEs. These models often feature system evolution coefficients that are parameterized by a high-dimensional vector $θ\in \mathbb{R}^n$, aiming to optimize expectations of the SDE, such as a value function, through stochastic gradient ascent. Consequently, designing efficient gradient estimators for which the computational complexity scales well with $n$ is of significant interest. This paper introduces a novel unbiased stochastic gradient estimator--the generator gradient estimator--for which the computation time remains stable in $n$. In addition to establishing the validity of our methodology for general SDEs with jumps, we also perform numerical experiments that test our estimator in linear-quadratic control problems parameterized by high-dimensional neural networks. The results show a significant improvement in efficiency compared to the widely used pathwise differentiation method: Our estimator achieves near-constant computation times, increasingly outperforms its counterpart as $n$ increases, and does so without compromising estimation variance. These empirical findings highlight the potential of our proposed methodology for optimizing SDEs in contemporary applications.
△ Less
Submitted 26 September, 2024; v1 submitted 13 July, 2024;
originally announced July 2024.
-
Deep Learning for Computing Convergence Rates of Markov Chains
Authors:
Yanlin Qu,
Jose Blanchet,
Peter Glynn
Abstract:
Convergence rate analysis for general state-space Markov chains is fundamentally important in areas such as Markov chain Monte Carlo and algorithmic analysis (for computing explicit convergence bounds). This problem, however, is notoriously difficult because traditional analytical methods often do not generate practically useful convergence bounds for realistic Markov chains. We propose the Deep C…
▽ More
Convergence rate analysis for general state-space Markov chains is fundamentally important in areas such as Markov chain Monte Carlo and algorithmic analysis (for computing explicit convergence bounds). This problem, however, is notoriously difficult because traditional analytical methods often do not generate practically useful convergence bounds for realistic Markov chains. We propose the Deep Contractive Drift Calculator (DCDC), the first general-purpose sample-based algorithm for bounding the convergence of Markov chains to stationarity in Wasserstein distance. The DCDC has two components. First, inspired by the new convergence analysis framework in (Qu et.al, 2023), we introduce the Contractive Drift Equation (CDE), the solution of which leads to an explicit convergence bound. Second, we develop an efficient neural-network-based CDE solver. Equipped with these two components, DCDC solves the CDE and converts the solution into a convergence bound. We analyze the sample complexity of the algorithm and further demonstrate the effectiveness of the DCDC by generating convergence bounds for realistic Markov chains arising from stochastic processing networks as well as constant step-size stochastic optimization.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
When are Unbiased Monte Carlo Estimators More Preferable than Biased Ones?
Authors:
Guanyang Wang,
Jose Blanchet,
Peter W. Glynn
Abstract:
Due to the potential benefits of parallelization, designing unbiased Monte Carlo estimators, primarily in the setting of randomized multilevel Monte Carlo, has recently become very popular in operations research and computational statistics. However, existing work primarily substantiates the benefits of unbiased estimators at an intuitive level or using empirical evaluations. The intuition being t…
▽ More
Due to the potential benefits of parallelization, designing unbiased Monte Carlo estimators, primarily in the setting of randomized multilevel Monte Carlo, has recently become very popular in operations research and computational statistics. However, existing work primarily substantiates the benefits of unbiased estimators at an intuitive level or using empirical evaluations. The intuition being that unbiased estimators can be replicated in parallel enabling fast estimation in terms of wall-clock time. This intuition ignores that, typically, bias will be introduced due to impatience because most unbiased estimators necesitate random completion times. This paper provides a mathematical framework for comparing these methods under various metrics, such as completion time and overall computational cost. Under practical assumptions, our findings reveal that unbiased methods typically have superior completion times - the degree of superiority being quantifiable through the tail behavior of their running time distribution - but they may not automatically provide substantial savings in overall computational costs. We apply our findings to Markov Chain Monte Carlo and Multilevel Monte Carlo methods to identify the conditions and scenarios where unbiased methods have an advantage, thus assisting practitioners in making informed choices between unbiased and biased methods.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
A Numerical Truncation Approximation with A Posteriori Error Bounds for the Solution of Poisson's Equation
Authors:
Saied Mahdian,
Peter W. Glynn,
Yuanyuan Liu
Abstract:
The solution to Poisson's equation arise in many Markov chain and Markov jump process settings, including that of the central limit theorem, value functions for average reward Markov decision processes, and within the gradient formula for equilibrium Markovian rewards. In this paper, we consider the problem of numerically computing the solution to Poisson's equation when the state space is infinit…
▽ More
The solution to Poisson's equation arise in many Markov chain and Markov jump process settings, including that of the central limit theorem, value functions for average reward Markov decision processes, and within the gradient formula for equilibrium Markovian rewards. In this paper, we consider the problem of numerically computing the solution to Poisson's equation when the state space is infinite or very large. In such settings, the state space must be truncated in order to make the problem computationally tractable. In this paper, we provide the first truncation approximation solution to Poisson's equation that comes with provable and computable a posteriori error bounds. Our theory applies to both discrete-time chains and continuous-time jump processes. Through numerical experiments, we show our method can provide highly accurate solutions and tight bounds.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
On the Maximization of Long-Run Reward CVaR for Markov Decision Processes
Authors:
Li Xia,
Zhihui Yu,
Peter W. Glynn
Abstract:
This paper studies the optimization of Markov decision processes (MDPs) from a risk-seeking perspective, where the risk is measured by conditional value-at-risk (CVaR). The objective is to find a policy that maximizes the long-run CVaR of instantaneous rewards over an infinite horizon across all history-dependent randomized policies. By establishing two optimality inequalities of opposing directio…
▽ More
This paper studies the optimization of Markov decision processes (MDPs) from a risk-seeking perspective, where the risk is measured by conditional value-at-risk (CVaR). The objective is to find a policy that maximizes the long-run CVaR of instantaneous rewards over an infinite horizon across all history-dependent randomized policies. By establishing two optimality inequalities of opposing directions, we prove that the maximum of long-run CVaR of MDPs over the set of history-dependent randomized policies can be found within the class of stationary randomized policies. In contrast to classical MDPs, we find that there may not exist an optimal stationary deterministic policy for maximizing CVaR. Instead, we prove the existence of an optimal stationary randomized policy that requires randomizing over at most two actions. Via a convex optimization representation of CVaR, we convert the long-run CVaR maximization MDP into a minimax problem, where we prove the interchangeability of minimum and maximum and the related existence of saddle point solutions. Furthermore, we propose an algorithm that finds the saddle point solution by solving two linear programs. These results are then extended to objectives that involve maximizing some combination of mean and CVaR of rewards simultaneously. Finally, we conduct numerical experiments to demonstrate the main results.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Moments of polynomial functionals of spectrally positive Lévy processes
Authors:
Peter W. Glynn,
Royi Jacobovic,
Michel Mandjes
Abstract:
Let $J(\cdot)$ be a compound Poisson process with rate $λ>0$ and a jumps distribution $G(\cdot)$ concentrated on $(0,\infty)$. In addition, let $V$ be a random variable which is distributed according to $G(\cdot)$ and independent from $J(\cdot)$. Define a new process $W(t)\equiv W_V(t)\equiv V+J(t)-t$, $t\geqslant 0$ and let $τ_V$ be the first time that $W(\cdot)$ hits the origin. A long-standing…
▽ More
Let $J(\cdot)$ be a compound Poisson process with rate $λ>0$ and a jumps distribution $G(\cdot)$ concentrated on $(0,\infty)$. In addition, let $V$ be a random variable which is distributed according to $G(\cdot)$ and independent from $J(\cdot)$. Define a new process $W(t)\equiv W_V(t)\equiv V+J(t)-t$, $t\geqslant 0$ and let $τ_V$ be the first time that $W(\cdot)$ hits the origin. A long-standing open problem due to Iglehart (1971) and Cohen (1979) is to derive the moments of the functional $\int_0^τW(t)\,{\rm d}t$ in terms of the moments of $G(\cdot)$ and $λ$. In the current work, we solve this problem in much greater generality, i.e., first by letting $J(\cdot)$ belong to a wide class of spectrally positive \color{black} Lévy processes and secondly, by considering more general class of functionals. We also supply several applications of the existing results, e.g., in studying the process $x\mapsto \int_0^{τ_x}W_x(t)\,{\rm d}t$ defined on $x\in[0,\infty)$.
△ Less
Submitted 16 April, 2025; v1 submitted 17 October, 2023;
originally announced October 2023.
-
Optimal Sample Complexity for Average Reward Markov Decision Processes
Authors:
Shengbo Wang,
Jose Blanchet,
Peter Glynn
Abstract:
We resolve the open question regarding the sample complexity of policy learning for maximizing the long-run average reward associated with a uniformly ergodic Markov decision process (MDP), assuming a generative model. In this context, the existing literature provides a sample complexity upper bound of $\widetilde O(|S||A|t_{\text{mix}}^2 ε^{-2})$ and a lower bound of…
▽ More
We resolve the open question regarding the sample complexity of policy learning for maximizing the long-run average reward associated with a uniformly ergodic Markov decision process (MDP), assuming a generative model. In this context, the existing literature provides a sample complexity upper bound of $\widetilde O(|S||A|t_{\text{mix}}^2 ε^{-2})$ and a lower bound of $Ω(|S||A|t_{\text{mix}} ε^{-2})$. In these expressions, $|S|$ and $|A|$ denote the cardinalities of the state and action spaces respectively, $t_{\text{mix}}$ serves as a uniform upper limit for the total variation mixing times, and $ε$ signifies the error tolerance. Therefore, a notable gap of $t_{\text{mix}}$ still remains to be bridged. Our primary contribution is the development of an estimator for the optimal policy of average reward MDPs with a sample complexity of $\widetilde O(|S||A|t_{\text{mix}}ε^{-2})$. This marks the first algorithm and analysis to reach the literature's lower bound. Our new algorithm draws inspiration from ideas in Li et al. (2020), Jin and Sidford (2021), and Wang et al. (2023). Additionally, we conduct numerical experiments to validate our theoretical findings.
△ Less
Submitted 12 February, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Computable Bounds on Convergence of Markov Chains in Wasserstein Distance via Contractive Drift
Authors:
Yanlin Qu,
Jose Blanchet,
Peter Glynn
Abstract:
We introduce a unified framework to estimate the convergence of Markov chains to equilibrium in Wasserstein distance. The framework can provide convergence bounds with rates ranging from polynomial to exponential, all derived from a contractive drift condition that integrates not only contraction and drift but also coupling and metric design. The resulting bounds are computable, as they contain si…
▽ More
We introduce a unified framework to estimate the convergence of Markov chains to equilibrium in Wasserstein distance. The framework can provide convergence bounds with rates ranging from polynomial to exponential, all derived from a contractive drift condition that integrates not only contraction and drift but also coupling and metric design. The resulting bounds are computable, as they contain simple constants, one-step transition expectations, but no equilibrium-related quantities. We introduce the large M technique and the boundary removal technique to enhance the applicability of the framework, which is further enhanced by deep learning in Qu, Blanchet and Glynn (2024). We apply the framework to non-contractive or even expansive Markov chains arising from queueing theory, stochastic optimization, and Markov chain Monte Carlo.
△ Less
Submitted 5 June, 2025; v1 submitted 20 August, 2023;
originally announced August 2023.
-
Overlapping Batch Confidence Intervals on Statistical Functionals Constructed from Time Series: Application to Quantiles, Optimization, and Estimation
Authors:
Ziwei Su,
Raghu Pasupathy,
Yingchieh Yeh,
Peter W. Glynn
Abstract:
We propose a general purpose confidence interval procedure (CIP) for statistical functionals constructed using data from a stationary time series. The procedures we propose are based on derived distribution-free analogues of the $χ^2$ and Student's $t$ random variables for the statistical functional context, and hence apply in a wide variety of settings including quantile estimation, gradient esti…
▽ More
We propose a general purpose confidence interval procedure (CIP) for statistical functionals constructed using data from a stationary time series. The procedures we propose are based on derived distribution-free analogues of the $χ^2$ and Student's $t$ random variables for the statistical functional context, and hence apply in a wide variety of settings including quantile estimation, gradient estimation, M-estimation, CVAR-estimation, and arrival process rate estimation, apart from more traditional statistical settings. Like the method of subsampling, we use overlapping batches of time series data to estimate the underlying variance parameter; unlike subsampling and the bootstrap, however, we assume that the implied point estimator of the statistical functional obeys a central limit theorem (CLT) to help identify the weak asymptotics (called OB-x limits, x=I,II,III) of batched Studentized statistics. The OB-x limits, certain functionals of the Wiener process parameterized by the size of the batches and the extent of their overlap, form the essential machinery for characterizing dependence, and consequently the correctness of the proposed CIPs. The message from extensive numerical experimentation is that in settings where a functional CLT on the point estimator is in effect, using \emph{large overlapping batches} alongside OB-x critical values yields confidence intervals that are often of significantly higher quality than those obtained from more generic methods like subsampling or the bootstrap. We illustrate using examples from CVaR estimation, ARMA parameter estimation, and NHPP rate estimation; R and MATLAB code for OB-x critical values is available at~\texttt{web.ics.purdue.edu/~pasupath/}.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Asymptotic product-form steady-state for generalized Jackson networks in multi-scale heavy traffic
Authors:
J. G. Dai,
Peter Glynn,
Yaosheng Xu
Abstract:
We prove that under a multi-scale heavy traffic condition, the stationary distribution of the scaled queue length vector process in any generalized Jackson network has a product-form limit. Each component in the product form follows an exponential distribution, corresponding to the Brownian approximation of a single station queue. The ``single station'' can be constructed precisely and its paramet…
▽ More
We prove that under a multi-scale heavy traffic condition, the stationary distribution of the scaled queue length vector process in any generalized Jackson network has a product-form limit. Each component in the product form follows an exponential distribution, corresponding to the Brownian approximation of a single station queue. The ``single station'' can be constructed precisely and its parameters have a good intuitive interpretation.
△ Less
Submitted 26 January, 2025; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Gaussian Limits for Scheduled Traffic with Super-Heavy Tailed Perturbations
Authors:
Victor F. Araman,
Peter W. Glynn
Abstract:
A scheduled arrival model is one in which customers are scheduled to arrive at constant interarrival times, but each customer actual arrival time is perturbed from her scheduled arrival time by a random perturbation. The sequence of perturbations is independent and identically distributed. It has previously been shown that the arrival counting process for scheduled traffic obeys a functional centr…
▽ More
A scheduled arrival model is one in which customers are scheduled to arrive at constant interarrival times, but each customer actual arrival time is perturbed from her scheduled arrival time by a random perturbation. The sequence of perturbations is independent and identically distributed. It has previously been shown that the arrival counting process for scheduled traffic obeys a functional central limit theorem (FCLT) with fractional Brownian motion (fBM) with Hurst parameter H strictly between 0 and 1/2 when the pertubratons have a Pareto-like tail with tail exponent lying in (0,1). Such limit processes exhibit less variability than Brownian motion, because the scheduling feature induces negative correlations in the arrival process. In this paper, we show that when the tail of the perturbations have a super-heavy tail, the FCLT limit process is Brownian motion (i.e. H=1/2), so that the heaviness of the tails eliminates any remaining negative correlations and generates a limit process with independent increments. We further study the case when the perturbations have a Cauchy-like tail, and show that the limit process in this setting is a fBM with H=0. So, this paper shows that the entire range of fBMs with $H \in [0,1/2]$ are possible as limits of scheduled traffic.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
Optimal Sample Complexity of Reinforcement Learning for Mixing Discounted Markov Decision Processes
Authors:
Shengbo Wang,
Jose Blanchet,
Peter Glynn
Abstract:
We consider the optimal sample complexity theory of tabular reinforcement learning (RL) for maximizing the infinite horizon discounted reward in a Markov decision process (MDP). Optimal worst-case complexity results have been developed for tabular RL problems in this setting, leading to a sample complexity dependence on $γ$ and $ε$ of the form $\tilde Θ((1-γ)^{-3}ε^{-2})$, where $γ$ denotes the di…
▽ More
We consider the optimal sample complexity theory of tabular reinforcement learning (RL) for maximizing the infinite horizon discounted reward in a Markov decision process (MDP). Optimal worst-case complexity results have been developed for tabular RL problems in this setting, leading to a sample complexity dependence on $γ$ and $ε$ of the form $\tilde Θ((1-γ)^{-3}ε^{-2})$, where $γ$ denotes the discount factor and $ε$ is the solution error tolerance. However, in many applications of interest, the optimal policy (or all policies) induces mixing. We establish that in such settings, the optimal sample complexity dependence is $\tilde Θ(t_{\text{mix}}(1-γ)^{-2}ε^{-2})$, where $t_{\text{mix}}$ is the total variation mixing time. Our analysis is grounded in regeneration-type ideas, which we believe are of independent interest, as they can be used to study RL problems for general state space MDPs.
△ Less
Submitted 30 September, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
Stability of a Queue Fed by Scheduled Traffic at Critical Loading
Authors:
Victor F. Araman,
Peter W. Glynn
Abstract:
Consider the workload process for a single server queue with deterministic service times in which customers arrive according to a scheduled traffic process. A scheduled arrival sequence is one in which customers are scheduled to arrive at constant interarrival times, but each customer actual arrival time is perturbed from her scheduled arrival time by a random perturbation. In this paper, we consi…
▽ More
Consider the workload process for a single server queue with deterministic service times in which customers arrive according to a scheduled traffic process. A scheduled arrival sequence is one in which customers are scheduled to arrive at constant interarrival times, but each customer actual arrival time is perturbed from her scheduled arrival time by a random perturbation. In this paper, we consider a critically loaded queue in which the service rate equals the arrival rate. Unlike a queue fed by renewal traffic, this queue can be stable even in the presence of critical loading. We identify a necessary and sufficient condition for stability when the perturbations have finite mean. Perhaps surprisingly, the criterion is not reversible, in the sense that such a queue can be stable for a scheduled traffic process in forward time, but unstable for the time-reversal of the same traffic process.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion
Authors:
Li Xia,
Peter W. Glynn
Abstract:
CVaR (Conditional Value at Risk) is a risk metric widely used in finance. However, dynamically optimizing CVaR is difficult since it is not a standard Markov decision process (MDP) and the principle of dynamic programming fails. In this paper, we study the infinite-horizon discrete-time MDP with a long-run CVaR criterion, from the view of sensitivity-based optimization. By introducing a pseudo CVa…
▽ More
CVaR (Conditional Value at Risk) is a risk metric widely used in finance. However, dynamically optimizing CVaR is difficult since it is not a standard Markov decision process (MDP) and the principle of dynamic programming fails. In this paper, we study the infinite-horizon discrete-time MDP with a long-run CVaR criterion, from the view of sensitivity-based optimization. By introducing a pseudo CVaR metric, we derive a CVaR difference formula which quantifies the difference of long-run CVaR under any two policies. The optimality of deterministic policies is derived. We obtain a so-called Bellman local optimality equation for CVaR, which is a necessary and sufficient condition for local optimal policies and only necessary for global optimal policies. A CVaR derivative formula is also derived for providing more sensitivity information. Then we develop a policy iteration type algorithm to efficiently optimize CVaR, which is shown to converge to local optima in the mixed policy space. We further discuss some extensions including the mean-CVaR optimization and the maximization of CVaR. Finally, we conduct numerical experiments relating to portfolio management to demonstrate the main results. Our work may shed light on dynamically optimizing CVaR from a sensitivity viewpoint.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
The Typical Behavior of Bandit Algorithms
Authors:
Lin Fan,
Peter W. Glynn
Abstract:
We establish strong laws of large numbers and central limit theorems for the regret of two of the most popular bandit algorithms: Thompson sampling and UCB. Here, our characterizations of the regret distribution complement the characterizations of the tail of the regret distribution recently developed by Fan and Glynn (2021) (arXiv:2109.13595). The tail characterizations there are associated with…
▽ More
We establish strong laws of large numbers and central limit theorems for the regret of two of the most popular bandit algorithms: Thompson sampling and UCB. Here, our characterizations of the regret distribution complement the characterizations of the tail of the regret distribution recently developed by Fan and Glynn (2021) (arXiv:2109.13595). The tail characterizations there are associated with atypical bandit behavior on trajectories where the optimal arm mean is under-estimated, leading to mis-identification of the optimal arm and large regret. In contrast, our SLLN's and CLT's here describe the typical behavior and fluctuation of regret on trajectories where the optimal arm mean is properly estimated. We find that Thompson sampling and UCB satisfy the same SLLN and CLT, with the asymptotics of both the SLLN and the (mean) centering sequence in the CLT matching the asymptotics of expected regret. Both the mean and variance in the CLT grow at $\log(T)$ rates with the time horizon $T$. Asymptotically as $T \to \infty$, the variability in the number of plays of each sub-optimal arm depends only on the rewards received for that arm, which indicates that each sub-optimal arm contributes independently to the overall CLT variance.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
A New Truncation Algorithm for Markov Chain Equilibrium Distributions with Computable Error Bounds
Authors:
Alex Infanger,
Peter W. Glynn
Abstract:
This paper introduces a new algorithm for numerically computing equilibrium (i.e. stationary) distributions for Markov chains and Markov jump processes with either a very large finite state space or a countably infinite state space. The algorithm is based on a ratio representation for equilibrium expectations in which the numerator and denominator correspond to expectations defined over paths that…
▽ More
This paper introduces a new algorithm for numerically computing equilibrium (i.e. stationary) distributions for Markov chains and Markov jump processes with either a very large finite state space or a countably infinite state space. The algorithm is based on a ratio representation for equilibrium expectations in which the numerator and denominator correspond to expectations defined over paths that start and end within a given return set $K$. When $K$ is a singleton, this representation is a well-known consequence of regenerative process theory. For computational tractability, we ignore contributions to the path expectations corresponding to excursions out of a given truncation set $A$. This yields a truncation algorithm that is provably convergent as $A$ gets large. Furthermore, in the presence of a suitable Lyapunov function, we can bound the path expectations, thereby providing computable and convergent error bounds for our numerical procedure. Our paper also provides a computational comparison with two other truncation methods that come with computable error bounds. The results are in alignment with the observation that our bounds have associated computational complexities that typically scale better as the truncation set gets bigger.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
A Short Proof of a Convex Representation for Stationary Distributions of Markov Chains with an Application to State Space Truncation
Authors:
Zeyu Zheng,
Alex Infanger,
Peter W. Glynn
Abstract:
In an influential paper, Courtois and Semal (1984) establish that when $G$ is an irreducible substochastic matrix for which $\sum_{n=0}^{\infty}G^n <\infty$, then the stationary distribution of any stochastic matrix $P\ge G$ can be expressed as a convex combination of the normalized rows of $(I-G)^{-1} = \sum_{n=0}^{\infty} G^n$. In this note, we give a short proof of this result that extends the…
▽ More
In an influential paper, Courtois and Semal (1984) establish that when $G$ is an irreducible substochastic matrix for which $\sum_{n=0}^{\infty}G^n <\infty$, then the stationary distribution of any stochastic matrix $P\ge G$ can be expressed as a convex combination of the normalized rows of $(I-G)^{-1} = \sum_{n=0}^{\infty} G^n$. In this note, we give a short proof of this result that extends the theory to the countably infinite and continuous state space settings. This result plays an important role in obtaining error bounds in algorithms involving nearly decomposable Markov chains, and also in state truncations for Markov chains. We also use the representation to establish a new total variation distance error bound for truncated Markov chains.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
On Convergence of a Truncation Scheme for Approximating Stationary Distributions of Continuous State Space Markov Chains and Processes
Authors:
Alex Infanger,
Peter W. Glynn
Abstract:
In the analysis of Markov chains and processes, it is sometimes convenient to replace an unbounded state space with a "truncated" bounded state space. When such a replacement is made, one often wants to know whether the equilibrium behavior of the truncated chain or process is close to that of the untruncated system. For example, such questions arise naturally when considering numerical methods fo…
▽ More
In the analysis of Markov chains and processes, it is sometimes convenient to replace an unbounded state space with a "truncated" bounded state space. When such a replacement is made, one often wants to know whether the equilibrium behavior of the truncated chain or process is close to that of the untruncated system. For example, such questions arise naturally when considering numerical methods for computing stationary distributions on unbounded state space. In this paper, we use the principle of "regeneration" to show that the stationary distributions of "fixed state" truncations converge in great generality (in total variation norm) to the stationary distribution of the untruncated limit, when the untruncated chain is positive Harris recurrent. Even in countable state space, our theory extends known results by showing that the augmentation can correspond to an $r$-regular measure. In addition, we extend our theory to cover an important subclass of Harris recurrent Markov processes that include non-explosive Markov jump processes on countable state space.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
On Convergence of General Truncation-Augmentation Schemes for Approximating Stationary Distributions of Markov Chains
Authors:
Alex Infanger,
Peter W. Glynn,
Yuanyuan Liu
Abstract:
In the analysis of Markov chains and processes, it is sometimes convenient to replace an unbounded state space with a "truncated" bounded state space. When such a replacement is made, one often wants to know whether the equilibrium behavior of the truncated chain or process is close to that of the untruncated system. For example, such questions arise naturally when considering numerical methods fo…
▽ More
In the analysis of Markov chains and processes, it is sometimes convenient to replace an unbounded state space with a "truncated" bounded state space. When such a replacement is made, one often wants to know whether the equilibrium behavior of the truncated chain or process is close to that of the untruncated system. For example, such questions arise naturally when considering numerical methods for computing stationary distributions on unbounded state space. In this paper, we study general truncation-augmentation schemes, in which the substochastic truncated "northwest corner" of the transition matrix or kernel is stochasticized (or augmented) arbitrarily. In the presence of a Lyapunov condition involving a coercive function, we show that such schemes are generally convergent in countable state space, provided that the truncation is chosen as a sublevel set of the Lyapunov function. For stochastically monotone Markov chains on $\mathbb Z_+$, we prove that we can always choose the truncation sets to be of the form $\{0,1,...,n\}$. We then provide sufficient conditions for weakly continuous Markov chains under which general truncation-augmentation schemes converge weakly in continuous state space. Finally, we briefly discuss the extension of the theory to continuous time Markov jump processes.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Solutions of Poisson's Equation for Stochastically Monotone Markov Chains
Authors:
Peter W. Glynn,
Alex Infanger
Abstract:
Stochastically monotone Markov chains arise in many applied domains, especially in the setting of queues and storage systems. Poisson's equation is a key tool for analyzing additive functionals of such models, such as cumulative sums of waiting times or sums of rewards. In this paper, we show that when the reward function for such a Markov chain is monotone, the solution of Poisson's equation is m…
▽ More
Stochastically monotone Markov chains arise in many applied domains, especially in the setting of queues and storage systems. Poisson's equation is a key tool for analyzing additive functionals of such models, such as cumulative sums of waiting times or sums of rewards. In this paper, we show that when the reward function for such a Markov chain is monotone, the solution of Poisson's equation is monotone. This implies that the value function associated with infinite horizon average reward is monotone in the state when the reward is monotone.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
Solving Poisson's Equation: Existence, Uniqueness, Martingale Structure, and CLT
Authors:
Peter W. Glynn,
Alex Infanger
Abstract:
The solution of Poisson's equation plays a key role in constructing the martingale through which sums of Markov correlated random variables can be analyzed. In this paper, we study two different representations for the solution in countable state space, one based on regenerative structure and the other based on an infinite sum of expectations. We also consider integrability and related uniqueness…
▽ More
The solution of Poisson's equation plays a key role in constructing the martingale through which sums of Markov correlated random variables can be analyzed. In this paper, we study two different representations for the solution in countable state space, one based on regenerative structure and the other based on an infinite sum of expectations. We also consider integrability and related uniqueness issues associated with solutions to Poisson's equation, and provide verifiable Lyapunov conditions to support our theory. Our key results include a central limit theorem and law of the iterated logarithm for Markov dependent sums, under Lyapunov conditions weaker than have previously appeared in the literature.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
The Fragility of Optimized Bandit Algorithms
Authors:
Lin Fan,
Peter W. Glynn
Abstract:
Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that designs that are optimal over certain exponential families can achieve expected regret that grows logarithmically in the number of arm plays, at a rate governed by the Lai-Robbins lower bound. In this paper, we show that when one uses such optimized designs, the regret d…
▽ More
Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that designs that are optimal over certain exponential families can achieve expected regret that grows logarithmically in the number of arm plays, at a rate governed by the Lai-Robbins lower bound. In this paper, we show that when one uses such optimized designs, the regret distribution of the associated algorithms necessarily has a very heavy tail, specifically, that of a truncated Cauchy distribution. Furthermore, for $p>1$, the $p$'th moment of the regret distribution grows much faster than poly-logarithmically, in particular as a power of the total number of arm plays. We show that optimized UCB bandit designs are also fragile in an additional sense, namely when the problem is even slightly mis-specified, the regret can grow much faster than the conventional theory suggests. Our arguments are based on standard change-of-measure ideas, and indicate that the most likely way that regret becomes larger than expected is when the optimal arm returns below-average rewards in the first few arm plays, thereby causing the algorithm to believe that the arm is sub-optimal. To alleviate the fragility issues exposed, we show that UCB algorithms can be modified so as to ensure a desired degree of robustness to mis-specification. In doing so, we also show a sharp trade-off between the amount of UCB exploration and the heaviness of the resulting regret distribution tail.
△ Less
Submitted 12 November, 2024; v1 submitted 28 September, 2021;
originally announced September 2021.
-
Distributed stochastic optimization with large delays
Authors:
Zhengyuan Zhou,
Panayotis Mertikopoulos,
Nicholas Bambos,
Peter W. Glynn,
Yinyu Ye
Abstract:
One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent on distributed computing architectures (possibly) asychronously. However, a key obstacle in the efficient implementation of DASGD is the issue of delays: when a…
▽ More
One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent on distributed computing architectures (possibly) asychronously. However, a key obstacle in the efficient implementation of DASGD is the issue of delays: when a computing node contributes a gradient update, the global model parameter may have already been updated by other nodes several times over, thereby rendering this gradient information stale. These delays can quickly add up if the computational throughput of a node is saturated, so the convergence of DASGD may be compromised in the presence of large delays. Our first contribution is that, by carefully tuning the algorithm's step-size, convergence to the critical set is still achieved in mean square, even if the delays grow unbounded at a polynomial rate. We also establish finer results in a broad class of structured optimization problems (called variationally coherent), where we show that DASGD converges to a global optimum with probability $1$ under the same delay assumptions. Together, these results contribute to the broad landscape of large-scale non-convex stochastic optimization by offering state-of-the-art theoretical guarantees and providing insights for algorithm design.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
Distributionally Robust Martingale Optimal Transport
Authors:
Zhengqing Zhou,
Jose Blanchet,
Peter W. Glynn
Abstract:
We study the problem of bounding path-dependent expectations (within any finite time horizon $d$) over the class of discrete-time martingales whose marginal distributions lie within a prescribed tolerance of a given collection of benchmark marginal distributions. This problem is a relaxation of the martingale optimal transport (MOT) problem and is motivated by applications to super-hedging in fina…
▽ More
We study the problem of bounding path-dependent expectations (within any finite time horizon $d$) over the class of discrete-time martingales whose marginal distributions lie within a prescribed tolerance of a given collection of benchmark marginal distributions. This problem is a relaxation of the martingale optimal transport (MOT) problem and is motivated by applications to super-hedging in financial markets. We show that the empirical version of our relaxed MOT problem can be approximated within $O\left( n^{-1/2}\right)$ error where $n$ is the number of samples of each of the individual marginal distributions (generated independently) and using a suitably constructed finite-dimensional linear programming problem.
△ Less
Submitted 29 November, 2021; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Unbiased Optimal Stopping via the MUSE
Authors:
Zhengqing Zhou,
Guanyang Wang,
Jose Blanchet,
Peter W. Glynn
Abstract:
We propose a new unbiased estimator for estimating the utility of the optimal stopping problem. The MUSE, short for Multilevel Unbiased Stopping Estimator, constructs the unbiased Multilevel Monte Carlo (MLMC) estimator at every stage of the optimal stopping problem in a backward recursive way. In contrast to traditional sequential methods, the MUSE can be implemented in parallel. We prove the MUS…
▽ More
We propose a new unbiased estimator for estimating the utility of the optimal stopping problem. The MUSE, short for Multilevel Unbiased Stopping Estimator, constructs the unbiased Multilevel Monte Carlo (MLMC) estimator at every stage of the optimal stopping problem in a backward recursive way. In contrast to traditional sequential methods, the MUSE can be implemented in parallel. We prove the MUSE has finite variance, finite computational complexity, and achieves $ε$-accuracy with $O(1/ε^2)$ computational cost under mild conditions. We demonstrate MUSE empirically in an option pricing problem involving a high-dimensional input and the use of many parallel processors.
△ Less
Submitted 26 December, 2022; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Diffusion Approximations for Thompson Sampling
Authors:
Lin Fan,
Peter W. Glynn
Abstract:
We study the behavior of Thompson sampling from the perspective of weak convergence. In the regime with small $γ> 0$, where the gaps between arm means scale as $\sqrtγ$ and over time horizons that scale as $1/γ$, we show that the dynamics of Thompson sampling evolve according to discrete versions of SDE's and stochastic ODE's. As $γ\downarrow 0$, we show that the dynamics converge weakly to soluti…
▽ More
We study the behavior of Thompson sampling from the perspective of weak convergence. In the regime with small $γ> 0$, where the gaps between arm means scale as $\sqrtγ$ and over time horizons that scale as $1/γ$, we show that the dynamics of Thompson sampling evolve according to discrete versions of SDE's and stochastic ODE's. As $γ\downarrow 0$, we show that the dynamics converge weakly to solutions of the corresponding SDE's and stochastic ODE's. Our weak convergence theory is developed from first principles using the Continuous Mapping Theorem, and can be easily adapted to analyze other sampling-based bandit algorithms. In this regime, we also show that the weak limits of the dynamics of many sampling-based algorithms -- including Thompson sampling designed for single-parameter exponential family rewards, and algorithms using bootstrap-based sampling to balance exploration and exploitation -- coincide with those of Gaussian Thompson sampling. Moreover, in this regime, these algorithms are generally robust to model mis-specification.
△ Less
Submitted 11 May, 2025; v1 submitted 19 May, 2021;
originally announced May 2021.
-
On Incorporating Forecasts into Linear State Space Model Markov Decision Processes
Authors:
Jacques A. de Chalendar,
Peter W. Glynn
Abstract:
Weather forecast information will very likely find increasing application in the control of future energy systems. In this paper, we introduce an augmented state space model formulation with linear dynamics, within which one can incorporate forecast information that is dynamically revealed alongside the evolution of the underlying state variable. We use the martingale model for forecast evolution…
▽ More
Weather forecast information will very likely find increasing application in the control of future energy systems. In this paper, we introduce an augmented state space model formulation with linear dynamics, within which one can incorporate forecast information that is dynamically revealed alongside the evolution of the underlying state variable. We use the martingale model for forecast evolution (MMFE) to enforce the necessary consistency properties that must govern the joint evolution of forecasts with the underlying state. The formulation also generates jointly Markovian dynamics that give rise to Markov decision processes (MDPs) that remain computationally tractable. This paper is the first to enforce MMFE consistency requirements within an MDP formulation that preserves tractability.
△ Less
Submitted 12 March, 2021;
originally announced March 2021.
-
On a Single Server Queue Fed by a Scheduled Traffic with Pareto Perturbations
Authors:
V. F. Araman,
H. Chen,
P. W. Glynn,
L. Xia
Abstract:
A "scheduled" arrival process is one in which the n th arrival is scheduled for time n, but instead occurs at a different time. The difference between the scheduled time and the arrival time is called the perturbation. The sequence of perturbations is assumed to be iid. We describe here the behavior of a single server queue fed by such traffic in which the processing times are deterministic. A par…
▽ More
A "scheduled" arrival process is one in which the n th arrival is scheduled for time n, but instead occurs at a different time. The difference between the scheduled time and the arrival time is called the perturbation. The sequence of perturbations is assumed to be iid. We describe here the behavior of a single server queue fed by such traffic in which the processing times are deterministic. A particular focus is on perturbation with Pareto-like tails but with finite mean. We obtain tail approximations for the steady-state workload in both cases where the queue is critically loaded and under a heavy-traffic regime. A key to our approach is our analysis of the tail behavior of a sum of independent Bernoulli random variables with success probability following a power law with parameter strictly larger than 1.
△ Less
Submitted 13 February, 2021;
originally announced February 2021.
-
Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach
Authors:
Peter Glynn,
Ramesh Johari,
Mohammad Rasouli
Abstract:
Suppose an online platform wants to compare a treatment and control policy, e.g., two different matching algorithms in a ridesharing system, or two different inventory management algorithms in an online retail site. Standard randomized controlled trials are typically not feasible, since the goal is to estimate policy performance on the entire system. Instead, the typical current practice involves…
▽ More
Suppose an online platform wants to compare a treatment and control policy, e.g., two different matching algorithms in a ridesharing system, or two different inventory management algorithms in an online retail site. Standard randomized controlled trials are typically not feasible, since the goal is to estimate policy performance on the entire system. Instead, the typical current practice involves dynamically alternating between the two policies for fixed lengths of time, and comparing the average performance of each over the intervals in which they were run as an estimate of the treatment effect. However, this approach suffers from *temporal interference*: one algorithm alters the state of the system as seen by the second algorithm, biasing estimates of the treatment effect. Further, the simple non-adaptive nature of such designs implies they are not sample efficient.
We develop a benchmark theoretical model in which to study optimal experimental design for this setting. We view testing the two policies as the problem of estimating the steady state difference in reward between two unknown Markov chains (i.e., policies). We assume estimation of the steady state reward for each chain proceeds via nonparametric maximum likelihood, and search for consistent (i.e., asymptotically unbiased) experimental designs that are efficient (i.e., asymptotically minimum variance). Characterizing such designs is equivalent to a Markov decision problem with a minimum variance objective; such problems generally do not admit tractable solutions. Remarkably, in our setting, using a novel application of classical martingale analysis of Markov chains via Poisson's equation, we characterize efficient designs via a succinct convex optimization problem. We use this characterization to propose a consistent, efficient online experimental design that adaptively samples the two Markov chains.
△ Less
Submitted 24 December, 2022; v1 submitted 9 June, 2020;
originally announced June 2020.
-
Efficient Steady-state Simulation of High-dimensional Stochastic Networks
Authors:
Jose Blanchet,
Xinyun Chen,
Peter Glynn,
Nian Si
Abstract:
We propose and study an asymptotically optimal Monte Carlo estimator for steady-state expectations of a d-dimensional reflected Brownian motion. Our estimator is asymptotically optimal in the sense that it requires $\tilde{O}(d)$ (up to logarithmic factors in $d$) i.i.d. Gaussian random variables in order to output an estimate with a controlled error. Our construction is based on the analysis of a…
▽ More
We propose and study an asymptotically optimal Monte Carlo estimator for steady-state expectations of a d-dimensional reflected Brownian motion. Our estimator is asymptotically optimal in the sense that it requires $\tilde{O}(d)$ (up to logarithmic factors in $d$) i.i.d. Gaussian random variables in order to output an estimate with a controlled error. Our construction is based on the analysis of a suitable multi-level Monte Carlo strategy which, we believe, can be applied widely. This is the first algorithm with linear complexity (under suitable regularity conditions) for steady-state estimation of RBM as the dimension increases.
△ Less
Submitted 27 January, 2020; v1 submitted 23 January, 2020;
originally announced January 2020.
-
Optimal $δ$-Correct Best-Arm Selection for Heavy-Tailed Distributions
Authors:
Shubhada Agrawal,
Sandeep Juneja,
Peter Glynn
Abstract:
Given a finite set of unknown distributions or arms that can be sampled, we consider the problem of identifying the one with the maximum mean using a $δ$-correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified $δ$) that has minimum sample complexity. Lower bounds for $δ$-correct algorithms are well known. $δ$-correct algorithms that match the low…
▽ More
Given a finite set of unknown distributions or arms that can be sampled, we consider the problem of identifying the one with the maximum mean using a $δ$-correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified $δ$) that has minimum sample complexity. Lower bounds for $δ$-correct algorithms are well known. $δ$-correct algorithms that match the lower bound asymptotically as $δ$ reduces to zero have been previously developed when arm distributions are restricted to a single parameter exponential family. In this paper, we first observe a negative result that some restrictions are essential, as otherwise, under a $δ$-correct algorithm, distributions with unbounded support would require an infinite number of samples in expectation. We then propose a $δ$-correct algorithm that matches the lower bound as $δ$ reduces to zero under the mild restriction that a known bound on the expectation of $(1+ε)^{th}$ moment of the underlying random variables exists, for $ε> 0$. We also propose batch processing and identify near-optimal batch sizes to speed up the proposed algorithm substantially. The best-arm problem has many learning applications, including recommendation systems and product selection. It is also a well-studied classic problem in the simulation community.
△ Less
Submitted 24 November, 2023; v1 submitted 24 August, 2019;
originally announced August 2019.
-
Optimal Transport Relaxations with Application to Wasserstein GANs
Authors:
Saied Mahdian,
Jose Blanchet,
Peter Glynn
Abstract:
We propose a family of relaxations of the optimal transport problem which regularize the problem by introducing an additional minimization step over a small region around one of the underlying transporting measures. The type of regularization that we obtain is related to smoothing techniques studied in the optimization literature. When using our approach to estimate optimal transport costs based o…
▽ More
We propose a family of relaxations of the optimal transport problem which regularize the problem by introducing an additional minimization step over a small region around one of the underlying transporting measures. The type of regularization that we obtain is related to smoothing techniques studied in the optimization literature. When using our approach to estimate optimal transport costs based on empirical measures, we obtain statistical learning bounds which are useful to guide the amount of regularization, while maintaining good generalization properties. To illustrate the computational advantages of our regularization approach, we apply our method to training Wasserstein GANs. We obtain running time improvements, relative to current benchmarks, with no deterioration in testing performance (via FID). The running time improvement occurs because our new optimality-based threshold criterion reduces the number of expensive iterates of the generating networks, while increasing the number of actor-critic iterations.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Multivariate Distributionally Robust Convex Regression under Absolute Error Loss
Authors:
Jose Blanchet,
Peter W. Glynn,
Jun Yan,
Zhengqing Zhou
Abstract:
This paper proposes a novel non-parametric multidimensional convex regression estimator which is designed to be robust to adversarial perturbations in the empirical measure. We minimize over convex functions the maximum (over Wasserstein perturbations of the empirical measure) of the absolute regression errors. The inner maximization is solved in closed form resulting in a regularization penalty i…
▽ More
This paper proposes a novel non-parametric multidimensional convex regression estimator which is designed to be robust to adversarial perturbations in the empirical measure. We minimize over convex functions the maximum (over Wasserstein perturbations of the empirical measure) of the absolute regression errors. The inner maximization is solved in closed form resulting in a regularization penalty involves the norm of the gradient. We show consistency of our estimator and a rate of convergence of order $ \widetilde{O}\left( n^{-1/d}\right) $, matching the bounds of alternative estimators based on square-loss minimization. Contrary to all of the existing results, our convergence rates hold without imposing compactness on the underlying domain and with no a priori bounds on the underlying convex function or its gradient norm.
△ Less
Submitted 25 July, 2020; v1 submitted 29 May, 2019;
originally announced May 2019.
-
Unbiased Multilevel Monte Carlo: Stochastic Optimization, Steady-state Simulation, Quantiles, and Other Applications
Authors:
Jose H. Blanchet,
Peter W. Glynn,
Yanan Pei
Abstract:
We present general principles for the design and analysis of unbiased Monte Carlo estimators in a wide range of settings. Our estimators posses finite work-normalized variance under mild regularity conditions. We apply our estimators to various settings of interest, including unbiased optimization in Sample Average Approximations, unbiased steady-state simulation of regenerative processes, quantil…
▽ More
We present general principles for the design and analysis of unbiased Monte Carlo estimators in a wide range of settings. Our estimators posses finite work-normalized variance under mild regularity conditions. We apply our estimators to various settings of interest, including unbiased optimization in Sample Average Approximations, unbiased steady-state simulation of regenerative processes, quantile estimation and nested simulation problems.
△ Less
Submitted 22 April, 2019;
originally announced April 2019.
-
Affine Jump-Diffusions: Stochastic Stability and Limit Theorems
Authors:
Xiaowei Zhang,
Peter W. Glynn
Abstract:
Affine jump-diffusions constitute a large class of continuous-time stochastic models that are particularly popular in finance and economics due to their analytical tractability. Methods for parameter estimation for such processes require ergodicity in order establish consistency and asymptotic normality of the associated estimators. In this paper, we develop stochastic stability conditions for aff…
▽ More
Affine jump-diffusions constitute a large class of continuous-time stochastic models that are particularly popular in finance and economics due to their analytical tractability. Methods for parameter estimation for such processes require ergodicity in order establish consistency and asymptotic normality of the associated estimators. In this paper, we develop stochastic stability conditions for affine jump-diffusions, thereby providing the needed large-sample theoretical support for estimating such processes. We establish ergodicity for such models by imposing a `strong mean reversion' condition and a mild condition on the distribution of the jumps, i.e. the finiteness of a logarithmic moment. Exponential ergodicity holds if the jumps have a finite moment of a positive order. In addition, we prove strong laws of large numbers and functional central limit theorems for additive functionals for this class of models.
△ Less
Submitted 31 October, 2018;
originally announced November 2018.
-
A Probabilistic Proof of the Perron-Frobenius Theorem
Authors:
Peter W. Glynn,
Paritosh Y. Desai
Abstract:
The Perron-Frobenius theorem plays an important role in many areas of management science and operations research. This paper provides a probabilistic perspective on the theorem, by discussing a proof that exploits a probabilistic representation of the Perron-Frobenius eigenvalue and eigenvectors in terms of the dynamics of a Markov chain. The proof provides conditions in both the finite-dimensiona…
▽ More
The Perron-Frobenius theorem plays an important role in many areas of management science and operations research. This paper provides a probabilistic perspective on the theorem, by discussing a proof that exploits a probabilistic representation of the Perron-Frobenius eigenvalue and eigenvectors in terms of the dynamics of a Markov chain. The proof provides conditions in both the finite-dimensional and infinite-dimensional settings under which the Perron-Frobenius eigenvalue and eigenvectors exist. Furthermore, the probabilistic representations that arise can be used to produce a Monte Carlo algorithm for computing the Perron-Frobenius eigenvalue and eigenvectors that will be explored elsewhere.
△ Less
Submitted 15 August, 2018;
originally announced August 2018.
-
Approximating Systems Fed by Poisson Processes with Rapidly Changing Arrival Rates
Authors:
Zeyu Zheng,
Harsha Honnappa,
Peter W. Glynn
Abstract:
This paper introduces a new asymptotic regime for simplifying stochastic models having non-stationary effects, such as those that arise in the presence of time-of-day effects. This regime describes an operating environment within which the arrival process to a service system has an arrival intensity that is fluctuating rapidly. We show that such a service system is well approximated by the corresp…
▽ More
This paper introduces a new asymptotic regime for simplifying stochastic models having non-stationary effects, such as those that arise in the presence of time-of-day effects. This regime describes an operating environment within which the arrival process to a service system has an arrival intensity that is fluctuating rapidly. We show that such a service system is well approximated by the corresponding model in which the arrival process is Poisson with a constant arrival rate. In addition to the basic weak convergence theorem, we also establish a first order correction for the distribution of the cumulative number of arrivals over $[0,t]$, as well as the number-in-system process for an infinite-server queue fed by an arrival process having a rapidly changing arrival rate. This new asymptotic regime provides a second regime within which non-stationary stochastic models can be reasonably approximated by a process with stationary dynamics, thereby complementing the previously studied setting within which rates vary slowly in time.
△ Less
Submitted 18 July, 2018;
originally announced July 2018.
-
A $c/μ$-Rule for Service Resource Allocation in Group-Server Queues
Authors:
Li Xia,
Zhe George Zhang,
Quan-Lin Li,
Peter W. Glynn
Abstract:
In this paper, we study a dynamic on/off server scheduling problem in a queueing system with multi-class servers, where servers are heterogeneous and can be classified into $K$ groups. Servers in the same group are homogeneous. A scheduling policy determines the number of working servers (servers that are turned on) in each group at every state $n$ (number of customers in the system). Our goal is…
▽ More
In this paper, we study a dynamic on/off server scheduling problem in a queueing system with multi-class servers, where servers are heterogeneous and can be classified into $K$ groups. Servers in the same group are homogeneous. A scheduling policy determines the number of working servers (servers that are turned on) in each group at every state $n$ (number of customers in the system). Our goal is to find the optimal scheduling policy to minimize the long-run average cost, which consists of an increasing convex holding cost and a linear operating cost. We use the sensitivity-based optimization theory to characterize the optimal policy. A necessary and sufficient condition of the optimal policy is derived. We also prove that the optimal policy has monotone structures and a quasi bang-bang control is optimal. We find that the optimal policy is indexed by the value of $c - μG(n)$, where $c$ is the operating cost rate, $μ$ is the service rate for a server, and $G(n)$ is a computable quantity called perturbation realization factor. Specifically, the group with smaller negative $c - μG(n)$ is more preferred to be turned on, while the group with positive $c - μG(n)$ should be turned off. However, the preference ranking of each group is affected by $G(n)$ and the preference order may change with the state $n$, the arrival rate, and the cost function. Under a reasonable condition of scale economies, we further prove that the optimal policy obeys a so-called $c$/$μ$-rule. That is, the servers with smaller $c$/$μ$ should be turned on with higher priority and the preference order of groups remains unchanged. This rule can be viewed as a sister version of the famous $cμ$-rule for polling queues. With the monotone property of $G(n)$, we further prove that the optimal policy has a multi-threshold structure when the $c$/$μ$-rule is applied.
△ Less
Submitted 11 September, 2018; v1 submitted 14 July, 2018;
originally announced July 2018.
-
Approximating Performance Measures for Slowly Changing Non-stationary Markov Chains
Authors:
Zeyu Zheng,
Harsha Honnappa,
Peter W. Glynn
Abstract:
This paper is concerned with the development of rigorous approximations to various expectations associated with Markov chains and processes having non-stationary transition probabilities. Such non-stationary models arise naturally in contexts in which time-of-day effects or seasonality effects need to be incorporated. Our approximations are valid asymptotically in regimes in which the transition p…
▽ More
This paper is concerned with the development of rigorous approximations to various expectations associated with Markov chains and processes having non-stationary transition probabilities. Such non-stationary models arise naturally in contexts in which time-of-day effects or seasonality effects need to be incorporated. Our approximations are valid asymptotically in regimes in which the transition probabilities change slowly over time. Specifically, we develop approximations for the expected infinite horizon discounted reward, the expected reward to the hitting time of a set, the expected reward associated with the state occupied by the chain at time $n$, and the expected cumulative reward over an interval $[0,n]$. In each case, the approximation involves a linear system of equations identical in form to that which one would need to solve to compute the corresponding quantity for a Markov model having stationary transition probabilities. In that sense, the theory provides an approximation no harder to compute than in the traditional stationary context. While most of the theory is developed for finite state Markov chains, we also provide generalizations to continuous state Markov chains, and finite state Markov jump processes in continuous time. In the latter context, one of our approximations coincides with the uniform acceleration asymptotic due to Massey and Whitt (1998).
△ Less
Submitted 4 May, 2018;
originally announced May 2018.
-
Probabilistic Contraction Analysis of Iterated Random Operators
Authors:
Abhishek Gupta,
Rahul Jain,
Peter Glynn
Abstract:
In many branches of engineering, Banach contraction mapping theorem is employed to establish the convergence of certain deterministic algorithms. Randomized versions of these algorithms have been developed that have proved useful in data-driven problems. In a class of randomized algorithms, in each iteration, the contraction map is approximated with an operator that uses independent and identicall…
▽ More
In many branches of engineering, Banach contraction mapping theorem is employed to establish the convergence of certain deterministic algorithms. Randomized versions of these algorithms have been developed that have proved useful in data-driven problems. In a class of randomized algorithms, in each iteration, the contraction map is approximated with an operator that uses independent and identically distributed samples of certain random variables. This leads to iterated random operators acting on an initial point in a complete metric space, and it generates a Markov chain. In this paper, we develop a new stochastic dominance based proof technique, called probabilistic contraction analysis, for establishing the convergence in probability of Markov chains generated by such iterated random operators in certain limiting regime. The methods developed in this paper provides a general framework for understanding convergence of a wide variety of Monte Carlo methods in which contractive property is present. We apply the convergence result to conclude the convergence of fitted value iteration and fitted relative value iteration in continuous state and continuous action Markov decision problems as representative applications of the general framework developed here.
△ Less
Submitted 21 September, 2023; v1 submitted 3 April, 2018;
originally announced April 2018.
-
Smoothed Variable Sample-size Accelerated Proximal Methods for Nonsmooth Stochastic Convex Programs
Authors:
Afrooz Jalilzadeh,
Uday V. Shanbhag,
Jose H. Blanchet,
Peter W. Glynn
Abstract:
We consider minimizing $f(x) = \mathbb{E}[f(x,ω)]$ when $f(x,ω)$ is possibly nonsmooth and either strongly convex or convex in $x$. (I) Strongly convex. When $f(x,ω)$ is $μ-$strongly convex in $x$, we propose a variable sample-size accelerated proximal scheme (VS-APM) and apply it on $f_η(x)$, the ($η$-)Moreau smoothed variant of $\mathbb{E}[f(x,ω)]$; we term such a scheme as (m-VS-APM). We consid…
▽ More
We consider minimizing $f(x) = \mathbb{E}[f(x,ω)]$ when $f(x,ω)$ is possibly nonsmooth and either strongly convex or convex in $x$. (I) Strongly convex. When $f(x,ω)$ is $μ-$strongly convex in $x$, we propose a variable sample-size accelerated proximal scheme (VS-APM) and apply it on $f_η(x)$, the ($η$-)Moreau smoothed variant of $\mathbb{E}[f(x,ω)]$; we term such a scheme as (m-VS-APM). We consider three settings. (a) Bounded domains. In this setting, VS-APM displays linear convergence in inexact gradient steps, each of which requires utilizing an inner (SSG) scheme. Specifically, mVS-APM achieves an optimal oracle complexity in SSG steps; (b) Unbounded domains. In this regime, under a weaker assumption of suitable state-dependent bounds on subgradients, an unaccelerated variant mVS-PM is linearly convergent; (c) Smooth ill-conditioned $f$. When $f$ is $L$-smooth and $κ= L/μ\ggg 1$, we employ mVS-APM where increasingly accurate gradients $\nabla_x f_η(x)$ are obtained by VS-APM. Notably, mVS-APM displays linear convergence and near-optimal complexity in inner proximal evaluations (upto a log factor) compared to VS-APM. But, unlike a direct application of VS-APM, this scheme is characterized by larger steplengths and better empirical behavior; (II) Convex. When $f(x,ω)$ is merely convex but smoothable, by suitable choices of the smoothing, steplength, and batch-size sequences, smoothed VS-APM (or sVS-APM) produces sequences for which expected sub-optimality diminishes at the rate of $\mathcal{O}(1/k)$ with an optimal oracle complexity of $\mathcal{O}(1/ε^2)$. Finally, sVS-APM and VS-APM produce sequences that converge almost surely to a solution of the original problem.
△ Less
Submitted 6 October, 2022; v1 submitted 1 March, 2018;
originally announced March 2018.
-
On Gaussian Limits and Large Deviations for Queues Fed by High Intensity Randomly Scattered Traffic
Authors:
Peter W. Glynn,
Harsha Honnappa
Abstract:
We study a single server FIFO queue that offers general service. Each of n customers enter the queue at random time epochs that are inde- pendent and identically distributed. We call this the random scattering traffic model, and the queueing model RS/G/1. We study the workload process associated with the queue in two different settings. First, we present Gaussian process approximations in a high i…
▽ More
We study a single server FIFO queue that offers general service. Each of n customers enter the queue at random time epochs that are inde- pendent and identically distributed. We call this the random scattering traffic model, and the queueing model RS/G/1. We study the workload process associated with the queue in two different settings. First, we present Gaussian process approximations in a high intensity asymptotic scale and characterize the transient distribution of the approximation. Second, we study the rare event paths of the workload by proving a large deviations principle in the same high intensity regime. We also obtain exact asymptotics for the Gaussian approximations developed prior. This analysis significantly extends and simplifies recent work in [1] on uniform population acceleration asymptotics to the queue length and workload in the RS/G/1 queue.
△ Less
Submitted 18 August, 2017;
originally announced August 2017.