Skip to main content

Showing 1–20 of 20 results for author: Agrawal, P

Searching in archive stat. Search in all archives.
.
  1. arXiv:2410.08868  [pdf, ps, other

    cs.LG stat.ML

    On the Convergence of Single-Timescale Actor-Critic

    Authors: Navdeep Kumar, Priyank Agrawal, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor

    Abstract: We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Markov Decision Processes (MDPs) with finite state spaces. To this end, we introduce an elegant analytical framework for handling complex, coupled recursions inherent in the algorithm. Leveraging this framework, we establish that the algorithm converges to an $ε$-close \textbf{… ▽ More

    Submitted 4 June, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

    Comments: updated version , 27 pages

  2. arXiv:2407.13743  [pdf, ps, other

    cs.LG stat.ML

    Optimistic Q-learning for average reward and episodic reinforcement learning

    Authors: Priyank Agrawal, Shipra Agrawal

    Abstract: We present an optimistic Q-learning algorithm for regret minimization in average reward reinforcement learning under an additional assumption on the underlying MDP that for all policies, the time to visit some frequent state $s_0$ is finite and upper bounded by $H$, either in expectation or with constant probability. Our setting strictly generalizes the episodic setting and is significantly less r… ▽ More

    Submitted 16 June, 2025; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: 37 pages, simplified proofs

  3. arXiv:2302.13934  [pdf, other

    cs.LG stat.ML

    Statistical Learning under Heterogeneous Distribution Shift

    Authors: Max Simchowitz, Anurag Ajay, Pulkit Agrawal, Akshay Krishnamurthy

    Abstract: This paper studies the prediction of a target $\mathbf{z}$ from a pair of random variables $(\mathbf{x},\mathbf{y})$, where the ground-truth predictor is additive $\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$. We study the performance of empirical risk minimization (ERM) over functions $f+g$, $f \in F$ and $g \in G$, fit on a given training distri… ▽ More

    Submitted 27 October, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  4. arXiv:2207.02200  [pdf, other

    cs.LG cs.AI stat.ML

    Offline RL Policies Should be Trained to be Adaptive

    Authors: Dibya Ghosh, Anurag Ajay, Pulkit Agrawal, Sergey Levine

    Abstract: Offline RL algorithms must account for the fact that the dataset they are provided may leave many facets of the environment unknown. The most common way to approach this challenge is to employ pessimistic or conservative methods, which avoid behaviors that are too dissimilar from those in the training dataset. However, relying exclusively on conservatism has drawbacks: performance is sensitive to… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: ICML 2022 (long talk)

  5. arXiv:2206.04672  [pdf, other

    cs.LG stat.ML

    Overcoming the Spectral Bias of Neural Value Approximation

    Authors: Ge Yang, Anurag Ajay, Pulkit Agrawal

    Abstract: Value approximation using deep neural networks is at the heart of off-policy deep reinforcement learning, and is often the primary module that provides learning signals to the rest of the algorithm. While multi-layer perceptron networks are universal function approximators, recent works in neural kernel regression suggest the presence of a spectral bias, where fitting high-frequency components of… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: Code and analysis available at https://geyang.github.io/ffn . First two authors contributed equally

  6. arXiv:2011.14033  [pdf, other

    cs.LG stat.ML

    A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

    Authors: Priyank Agrawal, Theja Tulabandhula, Vashist Avadhanula

    Abstract: In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the response in every round. Consumers purchase products to maximize their utility. We assume that a set of attributes describe the products, and the mean utility of… ▽ More

    Submitted 14 April, 2024; v1 submitted 27 November, 2020; originally announced November 2020.

    Comments: Bug fixed

  7. arXiv:2010.12163  [pdf, ps, other

    cs.LG stat.ML

    Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration

    Authors: Priyank Agrawal, Jinglin Chen, Nan Jiang

    Abstract: This paper studies regret minimization with randomized value functions in reinforcement learning. In tabular finite-horizon Markov Decision Processes, we introduce a clipping variant of one classical Thompson Sampling (TS)-like algorithm, randomized least-squares value iteration (RLSVI). Our $\tilde{\mathrm{O}}(H^2S\sqrt{AT})$ high-probability worst-case regret bound improves the previous sharpest… ▽ More

    Submitted 9 November, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Updated version, bug fixed

  8. arXiv:2007.05105  [pdf, other

    cs.LG stat.ML

    AdaScale SGD: A User-Friendly Algorithm for Distributed Training

    Authors: Tyler B. Johnson, Pulkit Agrawal, Haijie Gu, Carlos Guestrin

    Abstract: When using large-batch training to speed up stochastic gradient descent, learning rates must adapt to new batch sizes in order to maximize speed-ups and preserve model quality. Re-tuning learning rates is resource intensive, while fixed scaling rules often degrade model quality. We propose AdaScale SGD, an algorithm that reliably adapts learning rates to large-batch training. By continually adapti… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

    Comments: ICML 2020

  9. arXiv:2006.10356  [pdf, other

    cs.LG cs.AI stat.ML

    Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

    Authors: Priyank Agrawal, Theja Tulabandhula

    Abstract: We study the effect of persistence of engagement on learning in a stochastic multi-armed bandit setting. In advertising and recommendation systems, repetition effect includes a wear-in period, where the user's propensity to reward the platform via a click or purchase depends on how frequently they see the recommendation in the recent past. It also includes a counteracting wear-out period, where th… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: Appears in the 36th Conference on Uncertainty in Artificial Intelligence (UAI 2020)

  10. arXiv:2001.07853  [pdf, other

    cs.LG cs.IR stat.ML

    Incentivising Exploration and Recommendations for Contextual Bandits with Payments

    Authors: Priyank Agrawal, Theja Tulabandhula

    Abstract: We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. We also calculate theore… ▽ More

    Submitted 21 January, 2020; originally announced January 2020.

    Comments: 11 pages, 4 figures

  11. arXiv:1907.07804  [pdf, other

    cs.LG cs.CL cs.CV cs.NE stat.ML

    OmniNet: A unified architecture for multi-modal multi-task learning

    Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

    Abstract: Transformer is a popularly used neural network architecture, especially for language understanding. We introduce an extended and unified architecture that can be used for tasks involving a variety of modalities like image, text, videos, etc. We propose a spatio-temporal cache mechanism that enables learning spatial dimension of the input in addition to the hidden states corresponding to the tempor… ▽ More

    Submitted 3 July, 2020; v1 submitted 17 July, 2019; originally announced July 2019.

    Comments: Source code available at: https://github.com/subho406/OmniNet

  12. arXiv:1811.09026  [pdf, other

    cs.LG cs.AI stat.ML

    Bandits with Temporal Stochastic Constraints

    Authors: Priyank Agrawal, Theja Tulabandhula

    Abstract: We study the effect of impairment on stochastic multi-armed bandits and develop new ways to mitigate it. Impairment effect is the phenomena where an agent only accrues reward for an action if they have played it at least a few times in the recent past. It is practically motivated by repetition and recency effects in domains such as advertising (here consumer behavior may require repeat actions by… ▽ More

    Submitted 20 June, 2020; v1 submitted 22 November, 2018; originally announced November 2018.

    Comments: An extended abstract appeared in the 4th Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2019)

  13. arXiv:1810.11975  [pdf, other

    cs.LG cs.CL stat.ML

    On Controllable Sparse Alternatives to Softmax

    Authors: Anirban Laha, Saneem A. Chemmengath, Priyanka Agrawal, Mitesh M. Khapra, Karthik Sankaranarayanan, Harish G. Ramaswamy

    Abstract: Converting an n-dimensional vector to a probability distribution over n objects is a commonly used component in many machine learning tasks like multiclass classification, multilabel classification, attention mechanisms etc. For this, several probability mapping functions have been proposed and employed in literature such as softmax, sum-normalization, spherical softmax, and sparsemax, but there i… ▽ More

    Submitted 30 October, 2018; v1 submitted 29 October, 2018; originally announced October 2018.

    Comments: To appear in NIPS 2018, Total 16 pages including appendix

  14. arXiv:1809.08097  [pdf, other

    cs.LG cs.AI stat.ML

    Deep Domain Adaptation under Deep Label Scarcity

    Authors: Amar Prakash Azad, Dinesh Garg, Priyanka Agrawal, Arun Kumar

    Abstract: The goal behind Domain Adaptation (DA) is to leverage the labeled examples from a source domain so as to infer an accurate model in a target domain where labels are not available or in scarce at the best. A state-of-the-art approach for the DA is due to (Ganin et al. 2016), known as DANN, where they attempt to induce a common representation of source and target domains via adversarial training. Th… ▽ More

    Submitted 20 September, 2018; originally announced September 2018.

  15. arXiv:1806.08354  [pdf, other

    cs.CV cs.AI cs.LG cs.RO stat.ML

    Learning Instance Segmentation by Interaction

    Authors: Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik

    Abstract: We present an approach for building an active agent that learns to segment its visual observations into individual objects by interacting with its environment in a completely self-supervised manner. The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels. The model learned from over 50K interactions g… ▽ More

    Submitted 21 June, 2018; originally announced June 2018.

    Comments: Website at https://pathak22.github.io/seg-by-interaction/

  16. arXiv:1804.08606  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Zero-Shot Visual Imitation

    Authors: Deepak Pathak, Parsa Mahmoudieh, Guanghao Luo, Pulkit Agrawal, Dian Chen, Yide Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, Trevor Darrell

    Abstract: The current dominant paradigm for imitation learning relies on strong supervision of expert actions to learn both 'what' and 'how' to imitate. We pursue an alternative paradigm wherein an agent first explores the world without any expert supervision and then distills its experience into a goal-conditioned skill policy with a novel forward consistency loss. In our framework, the role of the expert… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

    Comments: Oral presentation at ICLR 2018. Website at https://pathak22.github.io/zeroshot-imitation/

  17. arXiv:1705.05363  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Curiosity-driven Exploration by Self-supervised Prediction

    Authors: Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell

    Abstract: In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore its environment and learn skills that might be useful later in its life. We formulate curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature sp… ▽ More

    Submitted 15 May, 2017; originally announced May 2017.

    Comments: In ICML 2017. Website at https://pathak22.github.io/noreward-rl/

  18. arXiv:1611.01843  [pdf, other

    stat.ML cs.AI cs.CV cs.LG cs.NE physics.soc-ph

    Learning to Perform Physics Experiments via Deep Reinforcement Learning

    Authors: Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas

    Abstract: When encountering novel objects, humans are able to infer a wide range of physical properties such as mass, friction and deformability by interacting with them in a goal driven way. This process of active interaction is in the same spirit as a scientist performing experiments to discover hidden facts. Recent advances in artificial intelligence have yielded machines that can achieve superhuman perf… ▽ More

    Submitted 17 August, 2017; v1 submitted 6 November, 2016; originally announced November 2016.

  19. arXiv:1508.06446  [pdf, ps, other

    stat.ML cs.LG

    Nested Hierarchical Dirichlet Processes for Multi-Level Non-Parametric Admixture Modeling

    Authors: Lavanya Sita Tekumalla, Priyanka Agrawal, Indrajit Bhattacharya

    Abstract: Dirichlet Process(DP) is a Bayesian non-parametric prior for infinite mixture modeling, where the number of mixture components grows with the number of data items. The Hierarchical Dirichlet Process (HDP), is an extension of DP for grouped data, often used for non-parametric topic modeling, where each group is a mixture over shared mixture densities. The Nested Dirichlet Process (nDP), on the othe… ▽ More

    Submitted 27 August, 2015; v1 submitted 26 August, 2015; originally announced August 2015.

    Comments: Proceedings of European Conference of Machine Learning (ECML) 2013

  20. arXiv:0804.2708  [pdf, ps, other

    stat.AP

    Correlated Link Shadow Fading in Multi-hop Wireless Networks

    Authors: Piyush Agrawal, Neal Patwari

    Abstract: Accurate representation of the physical layer is required for analysis and simulation of multi-hop networking in sensor, ad hoc, and mesh networks. This paper investigates, models, and analyzes the correlations that exist in shadow fading between links in multi-hop networks. Radio links that are geographically proximate often experience similar environmental shadowing effects and thus have corre… ▽ More

    Submitted 17 April, 2008; v1 submitted 16 April, 2008; originally announced April 2008.

    Comments: 26 pages with 10 figures and 2 tables