Skip to main content

Showing 1–12 of 12 results for author: A., P L

Searching in archive math. Search in all archives.
.
  1. arXiv:2409.05733  [pdf, ps, other

    math.ST math.PR stat.ML

    Markov Chain Variance Estimation: A Stochastic Approximation Approach

    Authors: Shubhada Agrawal, Prashanth L. A., Siva Theja Maguluri

    Abstract: We consider the problem of estimating the asymptotic variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean. We design a novel recursive estimator that requires $O(1)$ computation at each step, does not require storing any historical samples or any prior knowledge of run-length, and has optimal $O(\frac{1}{n})$ rate of convergence for t… ▽ More

    Submitted 22 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: 62 pages, 1 table, added additional references

  2. arXiv:2304.10951  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

    Authors: Mizhaan Prajit Maniyar, Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

    Abstract: We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  3. arXiv:2208.00290  [pdf, ps, other

    math.OC cs.LG

    A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

    Authors: Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

    Abstract: In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the delta sphere. We analyze the bias and variance of… ▽ More

    Submitted 30 June, 2023; v1 submitted 30 July, 2022; originally announced August 2022.

  4. arXiv:2002.11440  [pdf, ps, other

    cs.LG math.OC stat.ML

    Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles

    Authors: Nirav Bhavsar, Prashanth L. A

    Abstract: We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error that can be controlled through a batch size parameter. Our proposed oracles are appealing in several practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed (i.i.d.) samples, or simulation optimization, where the function measu… ▽ More

    Submitted 16 May, 2021; v1 submitted 26 February, 2020; originally announced February 2020.

  5. arXiv:1902.10709  [pdf, ps, other

    math.ST cs.LG stat.ML

    A Wasserstein distance approach for concentration of empirical risk estimates

    Authors: Prashanth L. A., Sanjay P. Bhat

    Abstract: This paper presents a unified approach based on Wasserstein distance to derive concentration bounds for empirical estimates for two broad classes of risk measures defined in the paper. The classes of risk measures introduced include as special cases well known risk measures from the finance literature such as conditional value at risk (CVaR), optimized certainty equivalent risk, spectral risk meas… ▽ More

    Submitted 10 May, 2022; v1 submitted 27 February, 2019; originally announced February 2019.

  6. arXiv:1810.09126  [pdf, ps, other

    cs.LG math.OC stat.ML

    Risk-Sensitive Reinforcement Learning via Policy Gradient Search

    Authors: Prashanth L. A., Michael Fu

    Abstract: The objective in a traditional reinforcement learning (RL) problem is to find a policy that optimizes the expected value of a performance metric such as the infinite-horizon cumulative discounted or long-run average cost/reward. In practice, optimizing the expected value alone may not be satisfactory, in that it may be desirable to incorporate the notion of risk into the optimization problem formu… ▽ More

    Submitted 23 May, 2022; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: To appear in "Foundations and Trends in Machine Learning"

  7. arXiv:1808.02871  [pdf, ps, other

    math.OC cs.LG

    Random directions stochastic approximation with deterministic perturbations

    Authors: Prashanth L A, Shalabh Bhatnagar, Nirav Bhavsar, Michael Fu, Steven I. Marcus

    Abstract: We introduce deterministic perturbation schemes for the recently proposed random directions stochastic approximation (RDSA) [17], and propose new first-order and second-order algorithms. In the latter case, these are the first second-order algorithms to incorporate deterministic perturbations. We show that the gradient and/or Hessian estimates in the resulting algorithms with deterministic perturb… ▽ More

    Submitted 28 March, 2019; v1 submitted 8 August, 2018; originally announced August 2018.

  8. arXiv:1507.07984  [pdf, ps, other

    cs.LG math.OC

    A constrained optimization perspective on actor critic algorithms and application to network routing

    Authors: Prashanth L. A., H. L. Prasad, Shalabh Bhatnagar, Prakash Chandra

    Abstract: We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear optimization problem. We also discuss an extension to incorporate function approximation and demonstrate the practicality of our algorithms on a network routin… ▽ More

    Submitted 28 July, 2015; originally announced July 2015.

  9. arXiv:1506.02632  [pdf, other

    cs.LG math.OC

    Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

    Authors: Prashanth L. A., Cheng Jie, Michael Fu, Steve Marcus, Csaba Szepesvári

    Abstract: Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting pre… ▽ More

    Submitted 26 February, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

  10. arXiv:1502.05577  [pdf, ps, other

    math.OC cs.LG

    Adaptive system optimization using random directions stochastic approximation

    Authors: Prashanth L. A., Shalabh Bhatnagar, Michael Fu, Steve Marcus

    Abstract: We present novel algorithms for simulation optimization using random directions stochastic approximation (RDSA). These include first-order (gradient) as well as second-order (Newton) schemes. We incorporate both continuous-valued as well as discrete-valued perturbations into both our algorithms. The former are chosen to be independent and identically distributed (i.i.d.) symmetric, uniformly distr… ▽ More

    Submitted 8 August, 2015; v1 submitted 19 February, 2015; originally announced February 2015.

  11. arXiv:1405.2690  [pdf, ps, other

    stat.ML cs.LG math.OC

    Policy Gradients for CVaR-Constrained MDPs

    Authors: Prashanth L. A.

    Abstract: We study a risk-constrained version of the stochastic shortest path (SSP) problem, where the risk measure considered is Conditional Value-at-Risk (CVaR). We propose two algorithms that obtain a locally risk-optimal policy by employing four tools: stochastic approximation, mini batches, policy gradients and importance sampling. Both the algorithms incorporate a CVaR estimation procedure, along the… ▽ More

    Submitted 12 May, 2014; originally announced May 2014.

  12. arXiv:1403.6530  [pdf, other

    cs.LG math.OC stat.ML

    Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

    Authors: Prashanth L. A., Mohammad Ghavamzadeh

    Abstract: In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common risk-sensitive criteria in finance and operations research. However, optimizing many such criteria is known to be a hard problem. In this paper, we consider both discounte… ▽ More

    Submitted 18 March, 2015; v1 submitted 25 March, 2014; originally announced March 2014.