Skip to main content

Showing 1–19 of 19 results for author: A., P L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.01101  [pdf, ps, other

    cs.CE q-fin.MF stat.CO

    Learning to optimize convex risk measures: The cases of utility-based shortfall risk and optimized certainty equivalent risk

    Authors: Sumedh Gupte, Prashanth L. A., Sanjay P. Bhat

    Abstract: We consider the problems of estimation and optimization of two popular convex risk measures: utility-based shortfall risk (UBSR) and Optimized Certainty Equivalent (OCE) risk. We extend these risk measures to cover possibly unbounded random variables. We cover prominent risk measures like the entropic risk, expectile risk, monotone mean-variance risk, Value-at-Risk, and Conditional Value-at-Risk a… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  2. arXiv:2504.20877  [pdf, other

    stat.ML cs.LG

    Preference-centric Bandits: Optimality of Mixtures and Regret-efficient Algorithms

    Authors: Meltem Tatlı, Arpan Mukherjee, Prashanth L. A., Karthikeyan Shanmugam, Ali Tajer

    Abstract: The objective of canonical multi-armed bandits is to identify and repeatedly select an arm with the largest reward, often in the form of the expected value of the arm's probability distribution. Such a utilitarian perspective and focus on the probability models' first moments, however, is agnostic to the distributions' tail behavior and their implications for variability and risks in decision-maki… ▽ More

    Submitted 30 April, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

    Comments: An earlier version of this manuscript, which focused on risk-sensitive bandits, has appeared in the Proceedings of the 2025 International Conference on Artificial Intelligence and Statistics (AISTATS)

  3. arXiv:2503.08896  [pdf, other

    stat.ML cs.LG

    Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms

    Authors: Meltem Tatlı, Arpan Mukherjee, Prashanth L. A., Karthikeyan Shanmugam, Ali Tajer

    Abstract: This paper introduces a general framework for risk-sensitive bandits that integrates the notions of risk-sensitive objectives by adopting a rich class of distortion riskmetrics. The introduced framework subsumes the various existing risk-sensitive models. An important and hitherto unknown observation is that for a wide range of riskmetrics, the optimal bandit policy involves selecting a mixture of… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: AISTATS 2025

  4. arXiv:2409.05733  [pdf, ps, other

    math.ST math.PR stat.ML

    Markov Chain Variance Estimation: A Stochastic Approximation Approach

    Authors: Shubhada Agrawal, Prashanth L. A., Siva Theja Maguluri

    Abstract: We consider the problem of estimating the asymptotic variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean. We design a novel recursive estimator that requires $O(1)$ computation at each step, does not require storing any historical samples or any prior knowledge of run-length, and has optimal $O(\frac{1}{n})$ rate of convergence for t… ▽ More

    Submitted 22 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: 62 pages, 1 table, added additional references

  5. arXiv:2304.10951  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

    Authors: Mizhaan Prajit Maniyar, Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

    Abstract: We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  6. arXiv:2210.05918  [pdf, ps, other

    cs.LG cs.AI eess.SY stat.ML

    Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

    Authors: Gandharv Patil, Prashanth L. A., Dheeraj Nagaraj, Doina Precup

    Abstract: We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges… ▽ More

    Submitted 19 September, 2024; v1 submitted 12 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, 2023

  7. arXiv:2205.05843  [pdf, ps, other

    stat.ML cs.IT cs.LG

    A Survey of Risk-Aware Multi-Armed Bandits

    Authors: Vincent Y. F. Tan, Prashanth L. A., Krishna Jagannathan

    Abstract: In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucial role, and a risk-aware performance measure is preferable, so as to capture losses in the case of adverse events. This survey aims to consolidate and summarise… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: 11 pages; Unabridged version of a a survey paper of the same title accepted to IJCAI-ECAI, 2022

  8. arXiv:2002.11440  [pdf, ps, other

    cs.LG math.OC stat.ML

    Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles

    Authors: Nirav Bhavsar, Prashanth L. A

    Abstract: We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error that can be controlled through a batch size parameter. Our proposed oracles are appealing in several practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed (i.i.d.) samples, or simulation optimization, where the function measu… ▽ More

    Submitted 16 May, 2021; v1 submitted 26 February, 2020; originally announced February 2020.

  9. arXiv:1912.10398  [pdf, other

    cs.LG stat.ML

    Estimation of Spectral Risk Measures

    Authors: Ajay Kumar Pandey, Prashanth L. A., Sanjay P. Bhat

    Abstract: We consider the problem of estimating a spectral risk measure (SRM) from i.i.d. samples, and propose a novel method that is based on numerical integration. We show that our SRM estimate concentrates exponentially, when the underlying distribution has bounded support. Further, we also consider the case when the underlying distribution is either Gaussian or exponential, and derive a concentration bo… ▽ More

    Submitted 22 December, 2019; originally announced December 2019.

  10. arXiv:1902.10709  [pdf, ps, other

    math.ST cs.LG stat.ML

    A Wasserstein distance approach for concentration of empirical risk estimates

    Authors: Prashanth L. A., Sanjay P. Bhat

    Abstract: This paper presents a unified approach based on Wasserstein distance to derive concentration bounds for empirical estimates for two broad classes of risk measures defined in the paper. The classes of risk measures introduced include as special cases well known risk measures from the finance literature such as conditional value at risk (CVaR), optimized certainty equivalent risk, spectral risk meas… ▽ More

    Submitted 10 May, 2022; v1 submitted 27 February, 2019; originally announced February 2019.

  11. arXiv:1902.02953  [pdf, ps, other

    cs.LG stat.ML

    Correlated bandits or: How to minimize mean-squared error online

    Authors: Vinay Praneeth Boda, Prashanth L. A

    Abstract: While the objective in traditional multi-armed bandit problems is to find the arm with the highest mean, in many settings, finding an arm that best captures information about other arms is of interest. This objective, however, requires learning the underlying correlation structure and not just the means of the arms. Sensors placement for industrial surveillance and cellular network monitoring are… ▽ More

    Submitted 26 June, 2019; v1 submitted 8 February, 2019; originally announced February 2019.

  12. arXiv:1901.00997  [pdf, ps, other

    cs.LG stat.ML

    Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions

    Authors: Prashanth L. A., Krishna Jagannathan, Ravi Kumar Kolla

    Abstract: Conditional Value-at-Risk (CVaR) is a widely used risk metric in applications such as finance. We derive concentration bounds for CVaR estimates, considering separately the cases of light-tailed and heavy-tailed distributions. In the light-tailed case, we use a classical CVaR estimator based on the empirical distribution constructed from the samples. For heavy-tailed random variables, we assume a… ▽ More

    Submitted 25 August, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

  13. arXiv:1810.09126  [pdf, ps, other

    cs.LG math.OC stat.ML

    Risk-Sensitive Reinforcement Learning via Policy Gradient Search

    Authors: Prashanth L. A., Michael Fu

    Abstract: The objective in a traditional reinforcement learning (RL) problem is to find a policy that optimizes the expected value of a performance metric such as the infinite-horizon cumulative discounted or long-run average cost/reward. In practice, optimizing the expected value alone may not be satisfactory, in that it may be desirable to incorporate the notion of risk into the optimization problem formu… ▽ More

    Submitted 23 May, 2022; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: To appear in "Foundations and Trends in Machine Learning"

  14. arXiv:1808.01739  [pdf, ps, other

    cs.LG stat.ML

    Concentration bounds for empirical conditional value-at-risk: The unbounded case

    Authors: Ravi Kumar Kolla, Prashanth L. A., Sanjay P. Bhat, Krishna Jagannathan

    Abstract: In several real-world applications involving decision making under uncertainty, the traditional expected value objective may not be suitable, as it may be necessary to control losses in the case of a rare but extreme event. Conditional Value-at-Risk (CVaR) is a popular risk measure for modeling the aforementioned objective. We consider the problem of estimating CVaR from i.i.d. samples of an unbou… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

  15. arXiv:1611.10283  [pdf, ps, other

    cs.LG stat.ML

    Bandit algorithms to emulate human decision making using probabilistic distortions

    Authors: Ravi Kumar Kolla, Prashanth L. A., Aditya Gopalan, Krishna Jagannathan, Michael Fu, Steve Marcus

    Abstract: Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the reward distributions: the classic $K$-armed bandit and the linearly parameterized bandit settings. We consider the aforementioned problems in the regret minimization as… ▽ More

    Submitted 31 October, 2023; v1 submitted 30 November, 2016; originally announced November 2016.

    Comments: The material in this paper was presented in part at the 2017 AAAI Conference on Artificial Intelligence

  16. arXiv:1609.07087  [pdf, other

    cs.LG stat.ML

    (Bandit) Convex Optimization with Biased Noisy Gradient Oracles

    Authors: Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári

    Abstract: Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients. Depending on the properties of the function to be optimized and the nature of ``noise'' in the bandit feedback, the bias and variance of gradient estimates exhibit various tradeoffs. In t… ▽ More

    Submitted 4 July, 2020; v1 submitted 22 September, 2016; originally announced September 2016.

  17. arXiv:1405.2690  [pdf, ps, other

    stat.ML cs.LG math.OC

    Policy Gradients for CVaR-Constrained MDPs

    Authors: Prashanth L. A.

    Abstract: We study a risk-constrained version of the stochastic shortest path (SSP) problem, where the risk measure considered is Conditional Value-at-Risk (CVaR). We propose two algorithms that obtain a locally risk-optimal policy by employing four tools: stochastic approximation, mini batches, policy gradients and importance sampling. Both the algorithms incorporate a CVaR estimation procedure, along the… ▽ More

    Submitted 12 May, 2014; originally announced May 2014.

  18. arXiv:1403.6530  [pdf, other

    cs.LG math.OC stat.ML

    Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

    Authors: Prashanth L. A., Mohammad Ghavamzadeh

    Abstract: In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common risk-sensitive criteria in finance and operations research. However, optimizing many such criteria is known to be a hard problem. In this paper, we consider both discounte… ▽ More

    Submitted 18 March, 2015; v1 submitted 25 March, 2014; originally announced March 2014.

  19. arXiv:1307.3176  [pdf, other

    cs.LG stat.ML

    Fast gradient descent for drifting least squares regression, with application to bandits

    Authors: Nathaniel Korda, Prashanth L. A., Rémi Munos

    Abstract: Online learning algorithms require to often recompute least squares regression estimates of parameters. We study improving the computational complexity of such algorithms by using stochastic gradient descent (SGD) type schemes in place of classic regression solvers. We show that SGD schemes efficiently track the true solutions of the regression problems, even in the presence of a drift. This findi… ▽ More

    Submitted 20 November, 2014; v1 submitted 11 July, 2013; originally announced July 2013.