Skip to main content

Showing 1–18 of 18 results for author: Rodomanov, A

Searching in archive math. Search in all archives.
.
  1. arXiv:2503.08427  [pdf, other

    math.OC cs.LG

    Accelerated Distributed Optimization with Compression and Error Feedback

    Authors: Yuan Gao, Anton Rodomanov, Jeremy Rack, Sebastian U. Stich

    Abstract: Modern machine learning tasks often involve massive datasets and models, necessitating distributed optimization algorithms with reduced communication overhead. Communication compression, where clients transmit compressed updates to a central server, has emerged as a key technique to mitigate communication bottlenecks. However, the theoretical understanding of stochastic distributed optimization wi… ▽ More

    Submitted 29 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  2. arXiv:2501.10258  [pdf, other

    math.OC cs.LG

    DADA: Dual Averaging with Distance Adaptation

    Authors: Mohammad Moshtaghifar, Anton Rodomanov, Daniil Vankov, Sebastian Stich

    Abstract: We present a novel universal gradient method for solving convex optimization problems. Our algorithm -- Dual Averaging with Distance Adaptation (DADA) -- is based on the classical scheme of dual averaging and dynamically adjusts its coefficients based on observed gradients and the distance between iterates and the starting point, eliminating the need for problem-specific parameters. DADA is a univ… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

  3. arXiv:2410.10800  [pdf, other

    math.OC

    Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

    Authors: Daniil Vankov, Anton Rodomanov, Angelia Nedich, Lalitha Sankar, Sebastian U. Stich

    Abstract: We study gradient methods for optimizing $(L_0, L_1)$-smooth functions, a class that generalizes Lipschitz-smooth functions and has gained attention for its relevance in machine learning. We provide new insights into the structure of this function class and develop a principled framework for analyzing optimization methods in this setting. While our convergence rate estimates recover existing resul… ▽ More

    Submitted 7 March, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

  4. arXiv:2407.07084  [pdf, other

    cs.LG math.OC

    Stabilized Proximal-Point Methods for Federated Optimization

    Authors: Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich

    Abstract: In developing efficient optimization algorithms, it is crucial to account for communication constraints -- a significant challenge in modern Federated Learning. The best-known communication complexity among non-accelerated algorithms is achieved by DANE, a distributed proximal-point algorithm that solves local subproblems at each iteration and that can exploit second-order similarity among individ… ▽ More

    Submitted 3 November, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Adaptive methods are added

  5. arXiv:2406.06398  [pdf, other

    math.OC

    Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction

    Authors: Anton Rodomanov, Xiaowen Jiang, Sebastian Stich

    Abstract: We present adaptive gradient methods (both basic and accelerated) for solving convex composite optimization problems in which the main part is approximately smooth (a.k.a. $(δ, L)$-smooth) and can be accessed only via a (potentially biased) stochastic gradient oracle. This setting covers many interesting examples including Hölder smooth problems and various inexact computations of the stochastic g… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  6. arXiv:2404.15051  [pdf, ps, other

    math.OC

    Global Complexity Analysis of BFGS

    Authors: Anton Rodomanov

    Abstract: In this paper, we present a global complexity analysis of the classical BFGS method with inexact line search, as applied to minimizing a strongly convex function with Lipschitz continuous gradient and Hessian. We consider a variety of standard line search strategies including the backtracking line search based on the Armijo condition, Armijo-Goldstein and Wolfe-Powell line searches. Our analysis s… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  7. arXiv:2404.08447  [pdf, other

    cs.LG math.OC

    Federated Optimization with Doubly Regularized Drift Correction

    Authors: Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich

    Abstract: Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized. The standard method, FedAvg, suffers from client drift which can hamper performance and increase communication costs over centralized methods. Previous works proposed various strategies to mitigate drift, yet none have shown uniformly… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  8. arXiv:2403.02967  [pdf, other

    math.OC cs.LG

    Non-convex Stochastic Composite Optimization with Polyak Momentum

    Authors: Yuan Gao, Anton Rodomanov, Sebastian U. Stich

    Abstract: The stochastic proximal gradient method is a powerful generalization of the widely used stochastic gradient descent (SGD) method and has found numerous applications in Machine Learning. However, it is notoriously known that this method fails to converge in non-convex settings where the stochastic noise is significant (i.e. when only small or bounded batch sizes are used). In this paper, we focus o… ▽ More

    Submitted 8 December, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  9. arXiv:2402.03210  [pdf, other

    math.OC

    Universal Gradient Methods for Stochastic Convex Optimization

    Authors: Anton Rodomanov, Ali Kavis, Yongtao Wu, Kimon Antonakopoulos, Volkan Cevher

    Abstract: We develop universal gradient methods for Stochastic Convex Optimization (SCO). Our algorithms automatically adapt not only to the oracle's noise but also to the Hölder smoothness of the objective function without a priori knowledge of the particular setting. The key ingredient is a novel strategy for adjusting step-size coefficients in the Stochastic Gradient Method (SGD). Unlike AdaGrad, which a… ▽ More

    Submitted 11 July, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  10. arXiv:2301.13194  [pdf, other

    math.OC cs.LG

    Polynomial Preconditioning for Gradient Methods

    Authors: Nikita Doikov, Anton Rodomanov

    Abstract: We study first-order methods with preconditioning for solving structured nonlinear convex optimization problems. We propose a new family of preconditioners generated by symmetric polynomials. They provide first-order optimization methods with a provable improvement of the condition number, cutting the gaps between highest eigenvalues, without explicit knowledge of the actual spectrum. We give a st… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  11. arXiv:2301.08352  [pdf, other

    math.OC

    Gradient Methods for Stochastic Optimization in Relative Scale

    Authors: Yurii Nesterov, Anton Rodomanov

    Abstract: We propose a new concept of a relatively inexact stochastic subgradient and present novel first-order methods that can use such objects to approximately solve convex optimization problems in relative scale. An important example where relatively inexact subgradients naturally arise is given by the Power or Lanczos algorithms for computing an approximate leading eigenvector of a symmetric positive s… ▽ More

    Submitted 28 May, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

  12. arXiv:2106.13340  [pdf, ps, other

    math.OC

    Subgradient Ellipsoid Method for Nonsmooth Convex Problems

    Authors: Anton Rodomanov, Yurii Nesterov

    Abstract: In this paper, we present a new ellipsoid-type algorithm for solving nonsmooth problems with convex structure. Examples of such problems include nonsmooth convex minimization problems, convex-concave saddle-point problems and variational inequalities with monotone operator. Our algorithm can be seen as a combination of the standard Subgradient and Ellipsoid methods. However, in contrast to the lat… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  13. New Results on Superlinear Convergence of Classical Quasi-Newton Methods

    Authors: Anton Rodomanov, Yurii Nesterov

    Abstract: We present a new theoretical analysis of local superlinear convergence of classical quasi-Newton methods from the convex Broyden class. As a result, we obtain a significant improvement in the currently known estimates of the convergence rates for these methods. In particular, we show that the corresponding rate of the Broyden-Fletcher-Goldfarb-Shanno method depends only on the product of the dimen… ▽ More

    Submitted 1 June, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: J Optim Theory Appl (2021). arXiv admin note: text overlap with arXiv:2003.09174

    Journal ref: J Optim Theory Appl 188, 744-769 (2021)

  14. Rates of superlinear convergence for classical quasi-Newton methods

    Authors: Anton Rodomanov, Yurii Nesterov

    Abstract: We study the local convergence of classical quasi-Newton methods for nonlinear optimization. Although it was well established a long time ago that asymptotically these methods converge superlinearly, the corresponding rates of convergence still remain unknown. In this paper, we address this problem. We obtain first explicit non-asymptotic rates of superlinear convergence for the standard quasi-New… ▽ More

    Submitted 1 June, 2021; v1 submitted 20 March, 2020; originally announced March 2020.

    Journal ref: Math. Program. (2021)

  15. Greedy Quasi-Newton Methods with Explicit Superlinear Convergence

    Authors: Anton Rodomanov, Yurii Nesterov

    Abstract: In this paper, we study greedy variants of quasi-Newton methods. They are based on the updating formulas from a certain subclass of the Broyden family. In particular, this subclass includes the well-known DFP, BFGS and SR1 updates. However, in contrast to the classical quasi-Newton methods, which use the difference of successive iterates for updating the Hessian approximations, our methods apply b… ▽ More

    Submitted 1 June, 2021; v1 submitted 3 February, 2020; originally announced February 2020.

    Journal ref: SIAM Journal on Optimization, 2021, Vol. 31, No. 1 : pp. 785-811

  16. Smoothness parameter of power of Euclidean norm

    Authors: Anton Rodomanov, Yurii Nesterov

    Abstract: In this paper, we study derivatives of powers of Euclidean norm. We prove their Hölder continuity and establish explicit expressions for the corresponding constants. We show that these constants are optimal for odd derivatives and at most two times suboptimal for the even ones. In the particular case of integer powers, when the Hölder continuity transforms into the Lipschitz continuity, we improve… ▽ More

    Submitted 1 June, 2021; v1 submitted 29 July, 2019; originally announced July 2019.

    Comments: J Optim Theory Appl (2020)

    Journal ref: J Optim Theory Appl 185, 303-326 (2020)

  17. arXiv:1904.04587  [pdf, ps, other

    math.OC

    A Randomized Coordinate Descent Method with Volume Sampling

    Authors: Anton Rodomanov, Dmitry Kropotov

    Abstract: We analyze the coordinate descent method with a new coordinate selection strategy, called volume sampling. This strategy prescribes selecting subsets of variables of certain size proportionally to the determinants of principal submatrices of the matrix, that bounds the curvature of the objective function. In the particular case, when the size of the subsets equals one, volume sampling coincides wi… ▽ More

    Submitted 29 April, 2020; v1 submitted 9 April, 2019; originally announced April 2019.

  18. arXiv:1606.08988  [pdf, ps, other

    math.OC

    Primal-Dual Method for Searching Equilibrium in Hierarchical Congestion Population Games

    Authors: Pavel Dvurechensky, Alexander Gasnikov, Evgenia Gasnikova, Sergey Matsievsky, Anton Rodomanov, Inna Usik

    Abstract: In this paper, we consider a large class of hierarchical congestion population games. One can show that the equilibrium in a game of such type can be described as a minimum point in a properly constructed multi-level convex optimization problem. We propose a fast primal-dual composite gradient method and apply it to the problem, which is dual to the problem describing the equilibrium in the consid… ▽ More

    Submitted 25 August, 2016; v1 submitted 29 June, 2016; originally announced June 2016.

    Comments: Supplementary Proceedings of the 9th International Conference on Discrete Optimization and Operations Research and Scientific School (DOOR-2016)

    MSC Class: 90C06; 90C25; 90C33; 90C35; 90C90; 49M29; 65K05 ACM Class: G.1.6