Skip to main content

Showing 1–11 of 11 results for author: Prashanth, L A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.17777  [pdf, ps, other

    cs.LG

    Optimizing Shortfall Risk Metric for Learning Regression Models

    Authors: Harish G. Ramaswamy, L. A. Prashanth

    Abstract: We consider the problem of estimating and optimizing utility-based shortfall risk (UBSR) of a loss, say $(Y - \hat Y)^2$, in the context of a regression problem. Empirical risk minimization with a UBSR objective is challenging since UBSR is a non-linear function of the underlying distribution. We first derive a concentration bound for UBSR estimation using independent and identically distributed (… ▽ More

    Submitted 11 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  2. arXiv:2406.07892  [pdf, ps, other

    cs.LG cs.AI

    A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP

    Authors: Tejaram Sangadi, L. A. Prashanth, Krishna Jagannathan

    Abstract: Motivated by applications in risk-sensitive reinforcement learning, we study mean-variance optimization in a discounted reward Markov Decision Process (MDP). Specifically, we analyze a Temporal Difference (TD) learning algorithm with linear function approximation (LFA) for policy evaluation. We derive finite-sample bounds that hold (i) in the mean-squared sense and (ii) with high probability under… ▽ More

    Submitted 12 March, 2025; v1 submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2405.20933  [pdf, ps, other

    cs.LG stat.ML

    Concentration Bounds for Optimized Certainty Equivalent Risk Estimation

    Authors: Ayon Ghosh, L. A. Prashanth, Krishna Jagannathan

    Abstract: We consider the problem of estimating the Optimized Certainty Equivalent (OCE) risk from independent and identically distributed (i.i.d.) samples. For the classic sample average approximation (SAA) of OCE, we derive mean-squared error as well as concentration bounds (assuming sub-Gaussianity). Further, we analyze an efficient stochastic approximation-based OCE estimator, and derive finite sample b… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  4. arXiv:2310.11389  [pdf, ps, other

    cs.LG stat.ML

    Risk Estimation in a Markov Cost Process: Lower and Upper Bounds

    Authors: Gugan Thoppe, L. A. Prashanth, Sanjay Bhat

    Abstract: We tackle the problem of estimating risk measures of the infinite-horizon discounted cost within a Markov cost process. The risk measures we study include variance, Value-at-Risk (VaR), and Conditional Value-at-Risk (CVaR). First, we show that estimating any of these risk measures with $ε$-accuracy, either in expected or high-probability sense, requires at least $Ω(1/ε^2)$ samples. Then, using a t… ▽ More

    Submitted 11 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

  5. arXiv:2212.10477  [pdf, ps, other

    cs.LG math.ST stat.ML

    Generalized Simultaneous Perturbation-based Gradient Search with Reduced Estimator Bias

    Authors: Soumen Pachal, Shalabh Bhatnagar, L. A. Prashanth

    Abstract: We present in this paper a family of generalized simultaneous perturbation-based gradient search (GSPGS) estimators that use noisy function measurements. The number of function measurements required by each estimator is guided by the desired level of accuracy. We first present in detail unbalanced generalized simultaneous perturbation stochastic approximation (GSPSA) estimators and later present t… ▽ More

    Submitted 12 November, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: The material in this paper was presented in part at the Conference on Information Sciences and Systems (CISS) in March 2023

  6. arXiv:2203.16810  [pdf, ps, other

    cs.LG

    Minimum mean-squared error estimation with bandit feedback

    Authors: Ayon Ghosh, L. A. Prashanth, Dipayan Sen, Aditya Gopalan

    Abstract: We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round. We propose two MSE estimators, and analyze their concentration properties. The first estimator is non-adaptive, as it is tied to a predetermined $m$-subset and lacks the flexibility to transition to… ▽ More

    Submitted 2 May, 2025; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: A two-page extended abstract version of this paper appeared in the Proceedings of the Ninth Indian Control Conference (ICC), 2023

  7. arXiv:2111.08805  [pdf, ps, other

    stat.ML cs.LG q-fin.RM

    Online Estimation and Optimization of Utility-Based Shortfall Risk

    Authors: Vishwajit Hegde, Arvind S. Menon, L. A. Prashanth, Krishna Jagannathan

    Abstract: Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly popular in financial applications, owing to certain desirable properties that it enjoys. We consider the problem of estimating UBSR in a recursive setting, where samples from the underlying loss distribution are available one-at-a-time. We cast the UBSR estimation problem as a root finding problem, and propose stochastic app… ▽ More

    Submitted 27 November, 2023; v1 submitted 16 November, 2021; originally announced November 2021.

  8. arXiv:1411.3224  [pdf, ps, other

    cs.LG math.OC stat.ML

    On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

    Authors: Nathaniel Korda, L. A. Prashanth

    Abstract: We provide non-asymptotic bounds for the well-known temporal difference learning algorithm TD(0) with linear function approximators. These include high-probability bounds as well as bounds in expectation. Our analysis suggests that a step-size inversely proportional to the number of iterations cannot guarantee optimal rate of convergence unless we assume (partial) knowledge of the stationary distr… ▽ More

    Submitted 1 September, 2015; v1 submitted 12 November, 2014; originally announced November 2014.

  9. arXiv:1403.4514  [pdf, ps, other

    math.OC cs.LG

    Simultaneous Perturbation Algorithms for Batch Off-Policy Search

    Authors: Raphael Fonteneau, L. A. Prashanth

    Abstract: We propose novel policy search algorithms in the context of off-policy, batch mode reinforcement learning (RL) with continuous state and action spaces. Given a batch collection of trajectories, we perform off-line policy evaluation using an algorithm similar to that by [Fonteneau et al., 2010]. Using this Monte-Carlo like policy evaluator, we perform policy search in a class of parameterized polic… ▽ More

    Submitted 31 March, 2014; v1 submitted 18 March, 2014; originally announced March 2014.

  10. arXiv:1401.2086  [pdf, ps, other

    cs.GT cs.LG stat.ML

    Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games

    Authors: H. L Prasad, L. A. Prashanth, Shalabh Bhatnagar

    Abstract: We consider the problem of finding stationary Nash equilibria (NE) in a finite discounted general-sum stochastic game. We first generalize a non-linear optimization problem from Filar and Vrieze [2004] to a $N$-player setting and break down this problem into simpler sub-problems that ensure there is no Bellman error for a given state and an agent. We then provide a characterization of solution poi… ▽ More

    Submitted 2 July, 2015; v1 submitted 8 January, 2014; originally announced January 2014.

  11. arXiv:1306.2557  [pdf, other

    cs.LG stat.ML

    Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling

    Authors: L. A. Prashanth, Nathaniel Korda, Rémi Munos

    Abstract: We propose a stochastic approximation (SA) based method with randomization of samples for policy evaluation using the least squares temporal difference (LSTD) algorithm. Our proposed scheme is equivalent to running regular temporal difference learning with linear function approximation, albeit with samples picked uniformly from a given dataset. Our method results in an $O(d)$ improvement in comple… ▽ More

    Submitted 24 January, 2020; v1 submitted 11 June, 2013; originally announced June 2013.