Search | arXiv e-print repository

Optimal Storage Design: An $L^{\infty}$ infused Inventory Control

Authors: Madhu Dhiman, Veeraruna Kavitha, Nandyala Hemachandra

Abstract: Inventory control typically considers controlling the price and the production rate. However, such systems have rigidity towards altering the physical storage capacity -- one can not easily alter the physical size after the initial design. The paper focuses on this critical aspect, consideration of which leads to a non-standard control problem. Here, the objective is a weighted combination of the… ▽ More Inventory control typically considers controlling the price and the production rate. However, such systems have rigidity towards altering the physical storage capacity -- one can not easily alter the physical size after the initial design. The paper focuses on this critical aspect, consideration of which leads to a non-standard control problem. Here, the objective is a weighted combination of the classical integral term (formed by usual inventory costs) and an $L^{\infty}$ term (the maximum inventory level in the entire planning horizon). Our approach is to consider an additional state component to capture the `instantaneous' $L^{\infty}$ term (maximum inventory level till that instant) by virtue of which, we could convert the problem to the classical framework. For the direct ($L^{\infty}$) problem, we first identify a relation between the optimal price and the production rate policy, thereby reducing the dimensionality of the problem. By numerically solving a smooth variant of the converted problem, we obtain an optimal policy that illustrates a significant reduction in the storage capacity requirement. Interestingly, the loss in the revenue is negligible (less than $6\%$). As the importance of the $L^{\infty}$ component increases, the variations in the corresponding optimal inventory-level trajectory reduce. In the scenarios with partial/zero information about future demand curves, the above observation provides a guidance -- one should continually tune the policies to maintain instantaneous inventory-levels as close to zero as possible. With such a policy, the reduction in revenue is negligible, while having significant improvements for storage capacity. We theoretically establish certain interesting properties of the optimal policy, which also support the above guidance. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2303.07834 [pdf, other]

Finite-Horizon Constrained MDPs With Both Additive And Multiplicative Utilities

Authors: Uday Kumar M, Sanjay P Bhat, Veeraruna Kavitha, Nandyala Hemachandra

Abstract: This paper considers the problem of finding a solution to the finite horizon constrained Markov decision processes (CMDP) where the objective as well as constraints are sum of additive and multiplicative utilities. Towards solving this, we construct another CMDP, with only additive utilities under a restricted set of policies, whose optimal value is equal to that of the original CMDP. Furthermore,… ▽ More This paper considers the problem of finding a solution to the finite horizon constrained Markov decision processes (CMDP) where the objective as well as constraints are sum of additive and multiplicative utilities. Towards solving this, we construct another CMDP, with only additive utilities under a restricted set of policies, whose optimal value is equal to that of the original CMDP. Furthermore, we provide a finite dimensional bilinear program (BLP) whose value equals the CMDP value and whose solution provides the optimal policy. We also suggest an algorithm to solve this BLP. △ Less

Submitted 15 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

arXiv:2209.14963 [pdf, ps, other]

Approximate Solutions To Constrained Risk-Sensitive Markov Decision Processes

Authors: Uday Kumar M, Sanjay P Bhat, Veeraruna Kavitha, Nandyala Hemachandra

Abstract: This paper considers the problem of finding near-optimal Markovian randomized (MR) policies for finite-state-action, infinite-horizon, constrained risk-sensitive Markov decision processes (CRSMDPs). Constraints are in the form of standard expected discounted cost functions as well as expected risk-sensitive discounted cost functions over finite and infinite horizons. The main contribution is to sh… ▽ More This paper considers the problem of finding near-optimal Markovian randomized (MR) policies for finite-state-action, infinite-horizon, constrained risk-sensitive Markov decision processes (CRSMDPs). Constraints are in the form of standard expected discounted cost functions as well as expected risk-sensitive discounted cost functions over finite and infinite horizons. The main contribution is to show that the problem possesses a solution if it is feasible, and to provide two methods for finding an approximate solution in the form of an ultimately stationary (US) MR policy. The latter is achieved through two approximating finite-horizon CRSMDPs which are constructed from the original CRSMDP by time-truncating the original objective and constraint cost functions, and suitably perturbing the constraint upper bounds. The first approximation gives a US policy which is $ε$-optimal and feasible for the original problem, while the second approximation gives a near-optimal US policy whose violation of the original constraints is bounded above by a specified $ε$. A key step in the proofs is an appropriate choice of a metric that makes the set of infinite-horizon MR policies and the feasible regions of the three CRSMDPs compact, and the objective and constraint functions continuous. A linear-programming-based formulation for solving the approximating finite-horizon CRSMDPs is also given. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: 38 pages

arXiv:2109.01654 [pdf, other]

Multi-agent Natural Actor-critic Reinforcement Learning Algorithms

Authors: Prashant Trivedi, Nandyala Hemachandra

Abstract: Multi-agent actor-critic algorithms are an important part of the Reinforcement Learning paradigm. We propose three fully decentralized multi-agent natural actor-critic (MAN) algorithms in this work. The objective is to collectively find a joint policy that maximizes the average long-term return of these agents. In the absence of a central controller and to preserve privacy, agents communicate some… ▽ More Multi-agent actor-critic algorithms are an important part of the Reinforcement Learning paradigm. We propose three fully decentralized multi-agent natural actor-critic (MAN) algorithms in this work. The objective is to collectively find a joint policy that maximizes the average long-term return of these agents. In the absence of a central controller and to preserve privacy, agents communicate some information to their neighbors via a time-varying communication network. We prove convergence of all the 3 MAN algorithms to a globally asymptotically stable set of the ODE corresponding to actor update; these use linear function approximations. We show that the Kullback-Leibler divergence between policies of successive iterates is proportional to the objective function's gradient. We observe that the minimum singular value of the Fisher information matrix is well within the reciprocal of the policy parameter dimension. Using this, we theoretically show that the optimal value of the deterministic variant of the MAN algorithm at each iterate dominates that of the standard gradient-based multi-agent actor-critic (MAAC) algorithm. To our knowledge, it is a first such result in multi-agent reinforcement learning (MARL). To illustrate the usefulness of our proposed algorithms, we implement them on a bi-lane traffic network to reduce the average network congestion. We observe an almost 25\% reduction in the average congestion in 2 MAN algorithms; the average congestion in another MAN algorithm is on par with the MAAC algorithm. We also consider a generic $15$ agent MARL; the performance of the MAN algorithms is again as good as the MAAC algorithm. △ Less

Submitted 2 April, 2022; v1 submitted 3 September, 2021; originally announced September 2021.

Comments: A very high-level summary of our revision is: In Section 3.5, we theoretically prove that the objective function value from the deterministic variant of MAN algorithms dominates that of the MAAC algorithm under some minimal conditions. It relies on the Lemma 2 of our paper: the minimum singular value of the Fisher information matrix is well within the reciprocal of the policy parameter dimension

arXiv:2008.07330 [pdf, other]

Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds and Comparison with KL-divergence based Optimal Posteriors and Cross-Validation Procedure

Authors: Puja Sahu, Nandyala Hemachandra

Abstract: We investigate optimal posteriors for recently introduced \cite{begin2016pac} chi-squared divergence based PAC-Bayesian bounds in terms of nature of their distribution, scalability of computations, and test set performance. For a finite classifier set, we deduce bounds for three distance functions: KL-divergence, linear and squared distances. Optimal posterior weights are proportional to deviation… ▽ More We investigate optimal posteriors for recently introduced \cite{begin2016pac} chi-squared divergence based PAC-Bayesian bounds in terms of nature of their distribution, scalability of computations, and test set performance. For a finite classifier set, we deduce bounds for three distance functions: KL-divergence, linear and squared distances. Optimal posterior weights are proportional to deviations of empirical risks, usually with subset support. For uniform prior, it is sufficient to search among posteriors on classifier subsets ordered by these risks. We show the bound minimization for linear distance as a convex program and obtain a closed-form expression for its optimal posterior. Whereas that for squared distance is a quasi-convex program under a specific condition, and the one for KL-divergence is non-convex optimization (a difference of convex functions). To compute such optimal posteriors, we derive fast converging fixed point (FP) equations. We apply these approaches to a finite set of SVM regularization parameter values to yield stochastic SVMs with tight bounds. We perform a comprehensive performance comparison between our optimal posteriors and known KL-divergence based posteriors on a variety of UCI datasets with varying ranges and variances in risk values, etc. Chi-squared divergence based posteriors have weaker bounds and worse test errors, hinting at an underlying regularization by KL-divergence based posteriors. Our study highlights the impact of divergence function on the performance of PAC-Bayesian classifiers. We compare our stochastic classifiers with cross-validation based deterministic classifier. The latter has better test errors, but ours is more sample robust, has quantifiable generalization guarantees, and is computationally much faster. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: arXiv admin note: text overlap with arXiv:1912.06803

arXiv:1810.08021 [pdf, other]

On a Conjecture for Dynamic Priority Queues and Nash Equilibrium for Quality of Service Sensitive Markets

Authors: Manu K. Gupta, N. Hemachandra

Abstract: Many economic transactions, including those of online markets, have a time lag between the start and end times of transactions. Customers need to wait for completion of their transaction (order fulfillment) and hence are also interested in their waiting time as a Quality of Service (QoS) attribute. So, they factor this QoS in the demand they offer to the firm (service-provider) and some customers… ▽ More Many economic transactions, including those of online markets, have a time lag between the start and end times of transactions. Customers need to wait for completion of their transaction (order fulfillment) and hence are also interested in their waiting time as a Quality of Service (QoS) attribute. So, they factor this QoS in the demand they offer to the firm (service-provider) and some customers (user-set) would be willing to pay for shorter waiting times. On the other hand, such waiting times depend on the demand user-set offers to the service-provider. We model the above economic-QoS strategic interaction between service-provider and user-set under a fairly generic scheduling framework as a non-cooperative constrained game. We use an existing joint pricing and scheduling model. An optimal solution to this joint pricing and scheduling problem was guaranteed by a finite step algorithm subject to a conjecture. We first settle this conjecture based on queuing and optimization arguments and discuss its implications on the above game. We show that a continuum of Nash equilibria (NE) exists and it can be computed easily using constrained best response dynamics. Revenue maximal NE is identified by above finite step algorithm. We illustrate how both players can benefit at such revenue maximal NE by identifying suitable operational decisions, i.e., by choosing an appropriate game along the theme of pricing and revenue management. △ Less

Submitted 18 October, 2018; originally announced October 2018.

arXiv:1605.00977 [pdf, ps, other]

Blackwell-Nash Equilibrium for Discrete and Continuous Time Stochastic Games

Authors: Vikas Vikram Singh, N. Hemachandra

Abstract: We consider both discrete and continuous time finite state-action stochastic games. In discrete time stochastic games, it is known that a stationary Blackwell-Nash equilibrium (BNE) exists for a single controller additive reward (SC-AR) stochastic game which is a special case of a general stochastic game. We show that, in general, the additive reward condition is needed for the existence of a BNE.… ▽ More We consider both discrete and continuous time finite state-action stochastic games. In discrete time stochastic games, it is known that a stationary Blackwell-Nash equilibrium (BNE) exists for a single controller additive reward (SC-AR) stochastic game which is a special case of a general stochastic game. We show that, in general, the additive reward condition is needed for the existence of a BNE. We give an example of a single controller stochastic game which does not satisfy additive reward condition. We show that this example does not have a stationary BNE. For a general discrete time discounted stochastic game we give two different sets of conditions and show that a stationary Nash equilibrium that satisfies any set of conditions is a BNE. One of these sets of conditions weakens a set of conditions available in the literature. For continuous time stochastic games, we give an example that does not have a stationary BNE. In fact, this example is a single controller continuous time stochastic game. Then, we introduce a continuous time SC-AR stochastic game. We show that there always exists a stationary deterministic BNE for continuous time SC-AR stochastic game. For a general continuous time discounted stochastic game we give two different sets of conditions and show that a Nash equilibrium that satisfies any set of conditions is a BNE. △ Less

Submitted 3 May, 2016; originally announced May 2016.

MSC Class: 91A05; 91A10; 91A15; 90C40

arXiv:1206.1672 [pdf, ps, other]

A mathematical programming based characterization of Nash equilibria of some constrained stochastic games

Authors: Vikas Vikram Singh, N. Hemachandra

Abstract: We consider two classes of constrained finite state-action stochastic games. First, we consider a two player nonzero sum single controller constrained stochastic game with both average and discounted cost criterion. We consider the same type of constraints as in [1], i.e., player 1 has subscription based constraints and player 2, who controls the transition probabilities, has realization based con… ▽ More We consider two classes of constrained finite state-action stochastic games. First, we consider a two player nonzero sum single controller constrained stochastic game with both average and discounted cost criterion. We consider the same type of constraints as in [1], i.e., player 1 has subscription based constraints and player 2, who controls the transition probabilities, has realization based constraints which can also depend on the strategies of player 1. Next, we consider a N -player nonzero sum constrained stochastic game with independent state processes where each player has average cost criterion as discussed in [2]. We show that the stationary Nash equilibria of both classes of constrained games, which exists under strong Slater and irreducibility conditions [3], [2], has one to one correspondence with global minima of certain mathematical programs. In the single controller game if the constraints of player 2 do not depend on the strategies of the player 1, then the mathematical program reduces to the non-convex quadratic program. In two player independent state processes stochastic game if the constraints of a player do not depend on the strategies of another player, then the mathematical program reduces to a non-convex quadratic program. Computational algorithms for finding global minima of non-convex quadratic program exist [4], [5] and hence, one can compute Nash equilibria of these constrained stochastic games. Our results generalize some existing results for zero sum games [1], [6], [7]. △ Less

Submitted 8 June, 2012; originally announced June 2012.

MSC Class: 91A10; 91A15; 90C05; 90C20; 90C26

arXiv:math/0212006 [pdf, ps, other]

Bounds for covariances and variances of truncated random variables

Authors: N. Hemachandra, V. Cheriyan

Abstract: We show that a lower bound for covariance of $\min(X_1,X_2)$ and $\max(X_1,X_2)$ is $\cov{X_1}{X_2}$ and an upper bound for variance of \\ $\min(X_2,\max(X,X_1))$ is $\var{X} + \var{X_1} +\var{X_2}$ generalizing previous results. We also characterize the cases where these bounds are sharp. We show that a lower bound for covariance of $\min(X_1,X_2)$ and $\max(X_1,X_2)$ is $\cov{X_1}{X_2}$ and an upper bound for variance of \\ $\min(X_2,\max(X,X_1))$ is $\var{X} + \var{X_1} +\var{X_2}$ generalizing previous results. We also characterize the cases where these bounds are sharp. △ Less

Submitted 1 December, 2002; originally announced December 2002.

Comments: 7 pages. Revised during October 2002

Report number: 02_2002 MSC Class: 60

Showing 1–9 of 9 results for author: Hemachandra, N