Skip to main content

Showing 1–18 of 18 results for author: Vaswani, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.00229  [pdf, ps, other

    cs.LG math.OC stat.ML

    Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster

    Authors: Sharan Vaswani, Reza Babanezhad

    Abstract: Armijo line-search (Armijo-LS) is a standard method to set the step-size for gradient descent (GD). For smooth functions, Armijo-LS alleviates the need to know the global smoothness constant L and adapts to the ``local'' smoothness, enabling GD to converge faster. Existing theoretical analyses show that GD with Armijo-LS (GD-LS) can result in constant factor improvements over GD with a 1/L step-si… ▽ More

    Submitted 3 June, 2025; v1 submitted 28 February, 2025; originally announced March 2025.

    Comments: ICML 2025. 37 pages

  2. arXiv:2401.06738  [pdf, ps, other

    math.OC cs.LG stat.ML

    (Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum

    Authors: Anh Dang, Reza Babanezhad, Sharan Vaswani

    Abstract: Stochastic heavy ball momentum (SHB) is commonly used to train machine learning models, and often provides empirical improvements over stochastic gradient descent. By primarily focusing on strongly-convex quadratics, we aim to better understand the theoretical advantage of SHB and subsequently improve the method. For strongly-convex quadratics, Kidambi et al. (2018) show that SHB (with a mini-batc… ▽ More

    Submitted 29 May, 2025; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: TMLR 2025

  3. arXiv:2206.06270  [pdf, other

    cs.LG stat.ML

    Near-Optimal Sample Complexity Bounds for Constrained MDPs

    Authors: Sharan Vaswani, Lin F. Yang, Csaba Szepesvári

    Abstract: In contrast to the advances in characterizing the sample complexity for solving Markov decision processes (MDPs), the optimal statistical complexity for solving constrained MDPs (CMDPs) remains unknown. We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator).… ▽ More

    Submitted 19 November, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: NeurIPS'22

  4. arXiv:2110.11442  [pdf, other

    math.OC cs.LG stat.ML

    Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

    Authors: Sharan Vaswani, Benjamin Dubois-Taine, Reza Babanezhad

    Abstract: We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $σ^2$ in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number $κ$, we prove that $T$ iterations of SGD with exponentially decreasing step-sizes and knowledge of the smoothness can achieve an… ▽ More

    Submitted 20 June, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: ICML 2022

  5. arXiv:2108.05828  [pdf, other

    cs.LG cs.AI stat.ML

    A general class of surrogate functions for stable and efficient reinforcement learning

    Authors: Sharan Vaswani, Olivier Bachem, Simone Totaro, Robert Mueller, Shivam Garg, Matthieu Geist, Marlos C. Machado, Pablo Samuel Castro, Nicolas Le Roux

    Abstract: Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives ris… ▽ More

    Submitted 30 October, 2023; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: Fixed minor typos

  6. arXiv:2102.09645  [pdf, other

    cs.LG math.OC stat.ML

    SVRG Meets AdaGrad: Painless Variance Reduction

    Authors: Benjamin Dubois-Taine, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Simon Lacoste-Julien

    Abstract: Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a more robust variant of SVRG, a common VR method. AdaSVRG uses AdaGrad in the inner loop of SVRG, making it robust to the choice of step… ▽ More

    Submitted 2 November, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

  7. arXiv:2006.06835  [pdf, other

    cs.LG math.OC stat.ML

    Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)

    Authors: Sharan Vaswani, Issam Laradji, Frederik Kunstner, Si Yi Meng, Mark Schmidt, Simon Lacoste-Julien

    Abstract: Adaptive gradient methods are typically used for training over-parameterized models. To better understand their behaviour, we study a simplistic setting -- smooth, convex losses with models over-parameterized enough to interpolate the data. In this setting, we prove that AMSGrad with constant step-size and momentum converges to the minimizer at a faster $O(1/T)$ rate. When interpolation is only ap… ▽ More

    Submitted 18 February, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  8. arXiv:2006.06821  [pdf, other

    cs.LG stat.ML

    To Each Optimizer a Norm, To Each Norm its Generalization

    Authors: Sharan Vaswani, Reza Babanezhad, Jose Gallego-Posada, Aaron Mishkin, Simon Lacoste-Julien, Nicolas Le Roux

    Abstract: We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and over-parametrized regimes. Since it is difficult to determine whether an optimizer converges to solutions that minimize a known norm, we flip the problem and investigate what is the corresponding norm minimized by an interpolating solution. Using this reasoni… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

  9. arXiv:2002.10542  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

    Authors: Nicolas Loizou, Sharan Vaswani, Issam Laradji, Simon Lacoste-Julien

    Abstract: We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting th… ▽ More

    Submitted 22 March, 2021; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

  10. arXiv:1910.04928  [pdf, other

    cs.LG stat.ML

    Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

    Authors: Sharan Vaswani, Abbas Mehrabian, Audrey Durand, Branislav Kveton

    Abstract: We propose $\tt RandUCB$, a bandit strategy that builds on theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), it uses randomization to trade off exploration and exploitation. In the $K$-armed bandit setting, we show that there are infinitely many variants of $\tt RandUCB$, all of which achieve the minimax-optimal… ▽ More

    Submitted 22 March, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: AISTATS 2020

  11. arXiv:1910.04920  [pdf, other

    cs.LG math.OC stat.ML

    Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

    Authors: Si Yi Meng, Sharan Vaswani, Issam Laradji, Mark Schmidt, Simon Lacoste-Julien

    Abstract: We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient… ▽ More

    Submitted 22 March, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: AISTATS, 2020

  12. arXiv:1905.09997  [pdf, other

    cs.LG math.OC stat.ML

    Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

    Authors: Sharan Vaswani, Aaron Mishkin, Issam Laradji, Mark Schmidt, Gauthier Gidel, Simon Lacoste-Julien

    Abstract: Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques t… ▽ More

    Submitted 4 June, 2021; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: Added a citation to the related work of Paul Tseng, and citations to methods that had previously explored line-searches for deep learning empirically

  13. arXiv:1811.05154  [pdf, other

    cs.LG stat.ML

    Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

    Authors: Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore

    Abstract: We propose a bandit algorithm that explores by randomizing its history of rewards. Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards. We design the pseudo rewards such that the bootstrap mean is optimistic with a sufficiently high probability. We call our algorithm Giro, which stands for garbage in, reward out. We an… ▽ More

    Submitted 20 June, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

    Comments: Proceedings of the 36th International Conference on Machine Learning

  14. arXiv:1810.07288  [pdf, other

    cs.LG stat.ML

    Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

    Authors: Sharan Vaswani, Francis Bach, Mark Schmidt

    Abstract: Modern machine learning focuses on highly expressive models that are able to fit or interpolate the data completely, resulting in zero training loss. For such models, we show that the stochastic gradients of common loss functions satisfy a strong growth condition. Under this condition, we prove that constant step-size stochastic gradient descent (SGD) with Nesterov acceleration matches the converg… ▽ More

    Submitted 5 April, 2019; v1 submitted 16 October, 2018; originally announced October 2018.

    Comments: AISTATS 2019

  15. arXiv:1810.04336  [pdf, other

    cs.LG stat.ML

    Combining Bayesian Optimization and Lipschitz Optimization

    Authors: Mohamed Osama Ahmed, Sharan Vaswani, Mark Schmidt

    Abstract: Bayesian optimization and Lipschitz optimization have developed alternative techniques for optimizing black-box functions. They each exploit a different form of prior about the function. In this work, we explore strategies to combine these techniques for better global optimization. In particular, we propose ways to use the Lipschitz continuity assumption within traditional BO algorithms, which we… ▽ More

    Submitted 28 July, 2020; v1 submitted 9 October, 2018; originally announced October 2018.

  16. arXiv:1805.09793  [pdf, other

    cs.LG stat.ML

    New Insights into Bootstrapping for Bandits

    Authors: Sharan Vaswani, Branislav Kveton, Zheng Wen, Anup Rao, Mark Schmidt, Yasin Abbasi-Yadkori

    Abstract: We investigate the use of bootstrapping in the bandit setting. We first show that the commonly used non-parametric bootstrapping (NPB) procedure can be provably inefficient and establish a near-linear lower bound on the regret incurred by it under the bandit model with Bernoulli rewards. We show that NPB with an appropriate amount of forced exploration can result in sub-linear albeit sub-optimal r… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

  17. arXiv:1605.06593  [pdf, other

    cs.LG cs.AI cs.SI math.OC stat.ML

    Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

    Authors: Zheng Wen, Branislav Kveton, Michal Valko, Sharan Vaswani

    Abstract: We study the online influence maximization problem in social networks under the independent cascade model. Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it. We address the challenges of (i) combinatorial action space, since the number of feasible influencer sets grows exponentially with the maximum number of influencers, an… ▽ More

    Submitted 19 June, 2018; v1 submitted 21 May, 2016; originally announced May 2016.

    Comments: Compared with the previous version, this version has fixed a mistake. This version is also consistent with the NIPS camera-ready version

    Journal ref: Z. Wen, B. Kveton, M. Valko, and S. Vaswani, "Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback", Advances in Neural Information Processing Systems 30 Proceedings, 2017

  18. arXiv:1503.00024  [pdf, other

    cs.SI cs.LG stat.ML

    Influence Maximization with Bandits

    Authors: Sharan Vaswani, Laks. V. S. Lakshmanan, Mark Schmidt

    Abstract: We consider the problem of \emph{influence maximization}, the problem of maximizing the number of people that become aware of a product by finding the `best' set of `seed' users to expose the product to. Most prior work on this topic assumes that we know the probability of each user influencing each other user, or we have data that lets us estimate these influences. However, this information is ty… ▽ More

    Submitted 27 April, 2016; v1 submitted 27 February, 2015; originally announced March 2015.

    Comments: 12 pages