Skip to main content

Showing 1–10 of 10 results for author: Babanezhad, R

Searching in archive math. Search in all archives.
.
  1. arXiv:2503.00229  [pdf, ps, other

    cs.LG math.OC stat.ML

    Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster

    Authors: Sharan Vaswani, Reza Babanezhad

    Abstract: Armijo line-search (Armijo-LS) is a standard method to set the step-size for gradient descent (GD). For smooth functions, Armijo-LS alleviates the need to know the global smoothness constant L and adapts to the ``local'' smoothness, enabling GD to converge faster. Existing theoretical analyses show that GD with Armijo-LS (GD-LS) can result in constant factor improvements over GD with a 1/L step-si… ▽ More

    Submitted 3 June, 2025; v1 submitted 28 February, 2025; originally announced March 2025.

    Comments: ICML 2025. 37 pages

  2. arXiv:2401.06738  [pdf, ps, other

    math.OC cs.LG stat.ML

    (Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum

    Authors: Anh Dang, Reza Babanezhad, Sharan Vaswani

    Abstract: Stochastic heavy ball momentum (SHB) is commonly used to train machine learning models, and often provides empirical improvements over stochastic gradient descent. By primarily focusing on strongly-convex quadratics, we aim to better understand the theoretical advantage of SHB and subsequently improve the method. For strongly-convex quadratics, Kidambi et al. (2018) show that SHB (with a mini-batc… ▽ More

    Submitted 29 May, 2025; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: TMLR 2025

  3. arXiv:2305.16257  [pdf, other

    cs.LG cs.AI math.SP

    Fast Online Node Labeling for Very Large Graphs

    Authors: Baojian Zhou, Yifan Sun, Reza Babanezhad

    Abstract: This paper studies the online node classification problem under a transductive learning setting. Current methods either invert a graph kernel matrix with $\mathcal{O}(n^3)$ runtime and $\mathcal{O}(n^2)$ space complexity or sample a large volume of random spanning trees, thus are difficult to scale to large graphs. In this work, we propose an improvement based on the \textit{online relaxation} tec… ▽ More

    Submitted 28 May, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: 40 pages,17 figures, ICML 2023

  4. arXiv:2305.15249  [pdf, other

    cs.LG cs.AI math.OC

    Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

    Authors: Sharan Vaswani, Amirreza Kazemi, Reza Babanezhad, Nicolas Le Roux

    Abstract: Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the TD error, an objective that is potentially decorrelated with the true goal of achieving a high reward with the actor. We address this mismatch by designing a j… ▽ More

    Submitted 30 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  5. arXiv:2302.02607  [pdf, other

    cs.LG math.OC

    Target-based Surrogates for Stochastic Optimization

    Authors: Jonathan Wilder Lavington, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Nicolas Le Roux

    Abstract: We consider minimizing functions for which it is expensive to compute the (possibly stochastic) gradient. Such functions are prevalent in reinforcement learning, imitation learning and adversarial training. Our target optimization framework uses the (expensive) gradient computation to construct surrogate functions in a \emph{target space} (e.g. the logits output by a linear model for classificatio… ▽ More

    Submitted 8 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

  6. arXiv:2110.11442  [pdf, other

    math.OC cs.LG stat.ML

    Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

    Authors: Sharan Vaswani, Benjamin Dubois-Taine, Reza Babanezhad

    Abstract: We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $σ^2$ in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number $κ$, we prove that $T$ iterations of SGD with exponentially decreasing step-sizes and knowledge of the smoothness can achieve an… ▽ More

    Submitted 20 June, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: ICML 2022

  7. arXiv:2102.09645  [pdf, other

    cs.LG math.OC stat.ML

    SVRG Meets AdaGrad: Painless Variance Reduction

    Authors: Benjamin Dubois-Taine, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Simon Lacoste-Julien

    Abstract: Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a more robust variant of SVRG, a common VR method. AdaSVRG uses AdaGrad in the inner loop of SVRG, making it robust to the choice of step… ▽ More

    Submitted 2 November, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

  8. arXiv:1906.03532  [pdf, other

    cs.LG math.OC stat.ML

    Reducing the variance in online optimization by transporting past gradients

    Authors: Sébastien M. R. Arnold, Pierre-Antoine Manzagol, Reza Babanezhad, Ioannis Mitliagkas, Nicolas Le Roux

    Abstract: Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing past gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transpo… ▽ More

    Submitted 18 June, 2019; v1 submitted 8 June, 2019; originally announced June 2019.

    Comments: Open-source implementation available at: https://github.com/seba-1511/igt.pth

  9. arXiv:1511.01942  [pdf, other

    cs.LG math.OC stat.CO stat.ML

    Stop Wasting My Gradients: Practical SVRG

    Authors: Reza Babanezhad, Mohamed Osama Ahmed, Alim Virani, Mark Schmidt, Jakub Konečný, Scott Sallinen

    Abstract: We present and analyze several strategies for improving the performance of stochastic variance-reduced gradient (SVRG) methods. We first show that the convergence rate of these methods can be preserved under a decreasing sequence of errors in the control variate, and use this to derive variants of SVRG that use growing-batch strategies to reduce the number of gradient calculations required in the… ▽ More

    Submitted 5 November, 2015; originally announced November 2015.

  10. arXiv:1504.04406  [pdf, other

    stat.ML cs.LG math.OC stat.CO

    Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields

    Authors: Mark Schmidt, Reza Babanezhad, Mohamed Osama Ahmed, Aaron Defazio, Ann Clifton, Anoop Sarkar

    Abstract: We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs). We describe a practical implementation that uses structure in the CRF gradient to reduce the memory requirement of this linearly-convergent stochastic gradient method, propose a non-uniform sampling scheme that substantially improves practical performance, and analyze the rate of convergence of the… ▽ More

    Submitted 16 April, 2015; originally announced April 2015.

    Comments: AI/Stats 2015, 24 pages