Skip to main content

Showing 1–16 of 16 results for author: Khaled, A

Searching in archive math. Search in all archives.
.
  1. arXiv:2502.12329  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    A Novel Unified Parametric Assumption for Nonconvex Optimization

    Authors: Artem Riabinin, Ahmed Khaled, Peter Richtárik

    Abstract: Nonconvex optimization is central to modern machine learning, but the general framework of nonconvex optimization yields weak convergence guarantees that are too pessimistic compared to practice. On the other hand, while convexity enables efficient optimization, it is of limited applicability to many practical problems. To bridge this gap and better understand the practical success of optimization… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  2. arXiv:2405.15682  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Road Less Scheduled

    Authors: Aaron Defazio, Xingyu Alice Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

    Abstract: Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from c… ▽ More

    Submitted 29 October, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2403.04081  [pdf, other

    cs.LG math.OC

    Directional Smoothness and Gradient Methods: Convergence and Adaptivity

    Authors: Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, Robert M. Gower

    Abstract: We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving implicit equations to obtain a seq… ▽ More

    Submitted 13 January, 2025; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: Published as a poster at NeurIPS 2024

  4. arXiv:2402.07793  [pdf, ps, other

    math.OC cs.LG stat.ML

    Tuning-Free Stochastic Optimization

    Authors: Ahmed Khaled, Chi Jin

    Abstract: Large-scale machine learning problems make the cost of hyperparameter tuning ever more prohibitive. This creates a need for algorithms that can tune themselves on-the-fly. We formalize the notion of "tuning-free" algorithms that can match the performance of optimally-tuned optimization algorithms up to polylogarithmic factors given only loose hints on the relevant problem parameters. We consider i… ▽ More

    Submitted 18 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  5. arXiv:2305.16284  [pdf, other

    cs.LG math.OC stat.ML

    DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

    Authors: Ahmed Khaled, Konstantin Mishchenko, Chi Jin

    Abstract: This paper proposes a new easy-to-implement parameter-free gradient-based optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient -- matching the convergence rate of optimally tuned gradient descent in convex optimization up to a logarithmic factor without tuning any parameters, and universal -- automatically adapting to both smooth and nonsmooth problems. While popular… ▽ More

    Submitted 29 January, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: 22 pages, 1 table, 4 figures

  6. arXiv:2209.02257  [pdf, other

    cs.LG math.OC stat.ML

    Faster federated optimization under second-order similarity

    Authors: Ahmed Khaled, Chi Jin

    Abstract: Federated learning (FL) is a subfield of machine learning where multiple clients try to collaboratively learn a model over a network under communication constraints. We consider finite-sum federated optimization under a second-order function similarity condition and strong convexity, and propose two new algorithms: SVRP and Catalyzed SVRP. This second-order similarity condition has grown popular r… ▽ More

    Submitted 22 May, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

    Comments: Published at ICLR 2023

  7. arXiv:2206.07021  [pdf, other

    cs.LG math.OC

    Federated Optimization Algorithms with Random Reshuffling and Gradient Compression

    Authors: Abdurakhmon Sadiev, Grigory Malinovsky, Eduard Gorbunov, Igor Sokolov, Ahmed Khaled, Konstantin Burlachenko, Peter Richtárik

    Abstract: Gradient compression is a popular technique for improving communication complexity of stochastic first-order methods in distributed training of machine learning models. However, the existing works consider only with-replacement sampling of stochastic gradients. In contrast, it is well-known in practice and recently confirmed in theory that stochastic methods based on without-replacement sampling,… ▽ More

    Submitted 3 November, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: 66 pages, 6 figures. Changes in V2: the presentation of the results was changed, extra experiments were added. Code: https://github.com/IgorSokoloff/rr_with_compression_experiments_source_code

  8. arXiv:2111.11556  [pdf, other

    cs.LG math.OC stat.ML

    FLIX: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning

    Authors: Elnur Gasanov, Ahmed Khaled, Samuel Horváth, Peter Richtárik

    Abstract: Federated Learning (FL) is an increasingly popular machine learning paradigm in which multiple nodes try to collaboratively learn under privacy, communication and multiple heterogeneity constraints. A persistent problem in federated learning is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling sev… ▽ More

    Submitted 23 February, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: V2: includes non-convex analysis as well as new large-scale experiments with neural networks. To appear in AISTATS 2022

  9. arXiv:2102.06704  [pdf, other

    cs.LG math.OC

    Proximal and Federated Random Reshuffling

    Authors: Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik

    Abstract: Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD) without replacement, is a popular and theoretically grounded method for finite-sum minimization. We propose two new algorithms: Proximal and Federated Random Reshuffing (ProxRR and FedRR). The first algorithm, ProxRR, solves composite convex finite-sum minimization problems in which the objective is the sum of a (potentially… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: 21 pages, 2 figures, 3 algorithms

  10. arXiv:2006.11573  [pdf, other

    cs.LG math.OC stat.ML

    Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

    Authors: Ahmed Khaled, Othmane Sebbouh, Nicolas Loizou, Robert M. Gower, Peter Richtárik

    Abstract: We present a unified theorem for the convergence analysis of stochastic gradient algorithms for minimizing a smooth and convex loss plus a convex regularizer. We do this by extending the unified analysis of Gorbunov, Hanzely \& Richtárik (2020) and dropping the requirement that the loss function be strongly convex. Instead, we only rely on convexity of the loss function. Our unified analysis appli… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

  11. arXiv:2006.05988  [pdf, other

    math.OC cs.LG stat.ML

    Random Reshuffling: Simple Analysis with Vast Improvements

    Authors: Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik

    Abstract: Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its sibling Stochastic Gradient Descent (SGD), RR is usually faster in practice and enjoys significant popularity in convex and non-convex optimization. The convergence rate of RR has attracted substantial attention r… ▽ More

    Submitted 5 April, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: v3 updates: Theorem 4 includes a new result for Polyak-Lojasiewicz functions. NeurIPS 2020. 35 pages, 2 figures, 2 tables, 3 algorithms

  12. arXiv:2002.03329  [pdf, other

    math.OC cs.LG stat.ML

    Better Theory for SGD in the Nonconvex World

    Authors: Ahmed Khaled, Peter Richtárik

    Abstract: Large-scale nonconvex optimization problems are ubiquitous in modern machine learning, and among practitioners interested in solving them, Stochastic Gradient Descent (SGD) reigns supreme. We revisit the analysis of SGD in the nonconvex setting and propose a new variant of the recently introduced expected smoothness assumption which governs the behaviour of the second moment of the stochastic grad… ▽ More

    Submitted 24 July, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

    Comments: 33 pages, 3 figures, 4 theorems, and 4 propositions. V3 updates: added several references on error conditions (Tseng, Solodov, Bottou and Tsitsiklis, Grimmer), added a full proof of Corollary 1, cleaned up several proofs, and made minor adjustments to text for clarity

  13. arXiv:1912.09925  [pdf, other

    cs.LG cs.DC math.NA math.OC

    Distributed Fixed Point Methods with Compressed Iterates

    Authors: Sélim Chraibi, Ahmed Khaled, Dmitry Kovalev, Peter Richtárik, Adil Salim, Martin Takáč

    Abstract: We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed. This problem is motivated by the practice of federated learning, where a large model stored in the cloud is compressed before it is sent to a mobile device, which then proceeds with training based on local data. We develop standard and variance reduced methods, and establis… ▽ More

    Submitted 20 December, 2019; originally announced December 2019.

    Comments: 15 pages, 4 algorithms, 4 Theorems

  14. arXiv:1909.04746  [pdf, other

    cs.LG cs.DC math.NA math.OC stat.ML

    Tighter Theory for Local SGD on Identical and Heterogeneous Data

    Authors: Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik

    Abstract: We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. T… ▽ More

    Submitted 14 April, 2022; v1 submitted 10 September, 2019; originally announced September 2019.

    Comments: AISTATS 2020. 31 pages, 1 algorithm, 5 theorems, 6 figures

  15. arXiv:1909.04716  [pdf, other

    cs.LG cs.DC math.NA math.OC stat.ML

    Gradient Descent with Compressed Iterates

    Authors: Ahmed Khaled, Peter Richtárik

    Abstract: We propose and analyze a new type of stochastic first order method: gradient descent with compressed iterates (GDCI). GDCI in each iteration first compresses the current iterate using a lossy randomized compression technique, and subsequently takes a gradient step. This method is a distillation of a key ingredient in the current practice of federated learning, where a model needs to be compressed… ▽ More

    Submitted 18 March, 2020; v1 submitted 10 September, 2019; originally announced September 2019.

    Comments: NeurIPS 2019 Workshop on Federated Learning for Data Privacy and Confidentiality. 10 pages, 1 algorithm, 1 theorem, 5 lemmas

  16. arXiv:1909.04715  [pdf, other

    cs.LG cs.DC math.NA math.OC stat.ML

    First Analysis of Local GD on Heterogeneous Data

    Authors: Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik

    Abstract: We provide the first convergence analysis of local gradient descent for minimizing the average of smooth and convex but otherwise arbitrary functions. Problems of this form and local gradient descent as a solution method are of importance in federated learning, where each function is based on private data stored by a user on a mobile device, and the data of different users can be arbitrarily heter… ▽ More

    Submitted 18 March, 2020; v1 submitted 10 September, 2019; originally announced September 2019.

    Comments: NeurIPS 2019 Workshop on Federated Learning for Data Privacy and Confidentiality. 11 pages, 4 lemmas, 1 theorem