Skip to main content

Showing 1–15 of 15 results for author: Sekhari, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.05405  [pdf, other

    cs.LG cs.AI stat.ML

    The Role of Environment Access in Agnostic Reinforcement Learning

    Authors: Akshay Krishnamurthy, Gene Li, Ayush Sekhari

    Abstract: We study Reinforcement Learning (RL) in environments with large state spaces, where function approximation is required for sample-efficient learning. Departing from a long history of prior work, we consider the weakest possible form of function approximation, called agnostic policy learning, where the learner seeks to find the best policy in a given class $Π$, with no guarantee that $Π$ contains a… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: comments welcome

  2. arXiv:2403.17091  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Offline Reinforcement Learning: Role of State Aggregation and Trajectory Data

    Authors: Zeyu Jia, Alexander Rakhlin, Ayush Sekhari, Chen-Yu Wei

    Abstract: We revisit the problem of offline reinforcement learning with value function realizability but without Bellman completeness. Previous work by Xie and Jiang (2021) and Foster et al. (2022) left open the question whether a bounded concentrability coefficient along with trajectory-based offline data admits a polynomial sample complexity. In this work, we provide a negative answer to this question for… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  3. arXiv:2401.09681  [pdf, other

    cs.LG stat.ML

    Harnessing Density Ratios for Online Reinforcement Learning

    Authors: Philip Amortila, Dylan J. Foster, Nan Jiang, Ayush Sekhari, Tengyang Xie

    Abstract: The theories of offline and online reinforcement learning, despite having evolved in parallel, have begun to show signs of the possibility for a unification, with algorithms and analysis techniques for one setting often having natural counterparts in the other. However, the notion of density ratio modeling, an emerging paradigm in offline RL, has been largely absent from online RL, perhaps for goo… ▽ More

    Submitted 4 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  4. arXiv:2311.08384  [pdf, other

    cs.LG cs.AI stat.ML

    Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees

    Authors: Yifei Zhou, Ayush Sekhari, Yuda Song, Wen Sun

    Abstract: Hybrid RL is the setting where an RL agent has access to both offline data and online data by interacting with the real-world environment. In this work, we propose a new hybrid RL algorithm that combines an on-policy actor-critic method with offline data. On-policy methods such as policy gradient and natural policy gradient (NPG) have shown to be more robust to model misspecification, though somet… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: The first two authors contributed equally

  5. arXiv:2310.06113  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    When is Agnostic Reinforcement Learning Statistically Tractable?

    Authors: Zeyu Jia, Gene Li, Alexander Rakhlin, Ayush Sekhari, Nathan Srebro

    Abstract: We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $Π$, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an $ε$-suboptimal policy with respect to $Π$? Towards that end, we introduce a new complexity measure, called the \emph{spanning capacity}, that depends solely on the set $Π$ and is ind… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  6. arXiv:2307.04998  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Selective Sampling and Imitation Learning via Online Regression

    Authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu

    Abstract: We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be sho… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  7. arXiv:2306.15744  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Ticketed Learning-Unlearning Schemes

    Authors: Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Ayush Sekhari, Chiyuan Zhang

    Abstract: We consider the learning--unlearning paradigm defined as follows. First given a dataset, the goal is to learn a good predictor, such as one minimizing a certain loss. Subsequently, given any subset of examples that wish to be unlearnt, the goal is to learn, without the knowledge of the original training dataset, a good predictor that is identical to the predictor that would have been produced when… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: Conference on Learning Theory (COLT) 2023

  8. arXiv:2211.14250  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Model-Free Reinforcement Learning with the Decision-Estimation Coefficient

    Authors: Dylan J. Foster, Noah Golowich, Jian Qian, Alexander Rakhlin, Ayush Sekhari

    Abstract: We consider the problem of interactive decision making, encompassing structured bandits and reinforcement learning with general function approximation. Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient, a measure of statistical complexity that lower bounds the optimal regret for interactive decision making, as well as a meta-algorithm, Estimation-to-Decisions, which ach… ▽ More

    Submitted 12 August, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: V2 changes: Improved writing and added more examples

  9. arXiv:2206.13063  [pdf, other

    cs.LG math.OC math.ST stat.ML

    On the Complexity of Adversarial Decision Making

    Authors: Dylan J. Foster, Alexander Rakhlin, Ayush Sekhari, Karthik Sridharan

    Abstract: A central problem in online learning and decision making -- from bandits to reinforcement learning -- is to understand what modeling assumptions lead to sample-efficient learning guarantees. We consider a general adversarial decision making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

  10. arXiv:2206.12081  [pdf, other

    cs.LG stat.ME stat.ML

    Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

    Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

    Abstract: We study reinforcement learning with function approximation for large-scale Partially Observable Markov Decision Processes (POMDPs) where the state space and observation space are large or even continuous. Particularly, we consider Hilbert space embeddings of POMDP where the feature of latent states and the feature of observations admit a conditional Hilbert space embedding of the observation emis… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

  11. arXiv:2206.12020  [pdf, ps, other

    cs.LG math.ST stat.ME stat.ML

    Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

    Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

    Abstract: We study Reinforcement Learning for partially observable dynamical systems using function approximation. We propose a new \textit{Partially Observable Bilinear Actor-Critic framework}, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as we… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

  12. arXiv:2006.13476  [pdf, other

    cs.LG math.OC stat.ML

    Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

    Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

    Abstract: We design an algorithm which finds an $ε$-approximate stationary point (with $\|\nabla F(x)\|\le ε$) using $O(ε^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and---surprisingly---tha… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: Accepted to CONFERENCE ON LEARNING THEORY (COLT) 2020

  13. arXiv:2005.03789  [pdf, other

    cs.LG cs.AI stat.ML

    Reinforcement Learning with Feedback Graphs

    Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

    Abstract: We study episodic reinforcement learning in Markov decision processes when the agent receives additional feedback per step in the form of several transition observations. Such additional observations are available in a range of tasks through extended sensors or prior knowledge about the environment (e.g., when certain actions yield similar outcome). We formalize this setting using a feedback graph… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

  14. arXiv:1902.04686  [pdf, ps, other

    cs.LG math.OC stat.ML

    The Complexity of Making the Gradient Small in Stochastic Convex Optimization

    Authors: Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth

    Abstract: We give nearly matching upper and lower bounds on the oracle complexity of finding $ε$-stationary points ($\| \nabla F(x) \| \leqε$) in stochastic convex optimization. We jointly analyze the oracle complexity in both the local stochastic oracle model and the global oracle (or, statistical learning) model. This allows us to decompose the complexity of finding near-stationary points into optimizatio… ▽ More

    Submitted 14 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

  15. arXiv:1810.11059  [pdf, ps, other

    cs.LG math.OC stat.ML

    Uniform Convergence of Gradients for Non-Convex Learning and Optimization

    Authors: Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

    Abstract: We investigate 1) the rate at which refined properties of the empirical risk---in particular, gradients---converge to their population counterparts in standard non-convex learning tasks, and 2) the consequences of this convergence for optimization. Our analysis follows the tradition of norm-based capacity control. We propose vector-valued Rademacher complexities as a simple, composable, and user-f… ▽ More

    Submitted 11 November, 2018; v1 submitted 25 October, 2018; originally announced October 2018.

    Comments: To appear in Neural Information Processing Systems (NIPS) 2018