Skip to main content

Showing 1–50 of 66 results for author: Beznosikov, A

Searching in archive math. Search in all archives.
.
  1. arXiv:2506.04430  [pdf, ps, other

    cs.LG math.OC

    Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order

    Authors: Egor Petrov, Grigoriy Evseev, Aleksey Antonov, Andrey Veprikov, Pavel Plyusnin, Nikolay Bushkov, Stanislav Moiseev, Aleksandr Beznosikov

    Abstract: Fine-tuning Large Language Models (LLMs) is essential for adapting pre-trained models to downstream tasks. Yet traditional first-order optimizers such as Stochastic Gradient Descent (SGD) and Adam incur prohibitive memory and computational costs that scale poorly with model size. In this paper, we investigate zero-order (ZO) optimization methods as a memory- and compute-efficient alternative, part… ▽ More

    Submitted 11 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

    Comments: 26 pages, 5 tables

  2. arXiv:2506.03725  [pdf, ps, other

    cs.LG math.OC

    Sign-SGD is the Golden Gate between Multi-Node to Single-Node Learning: Significant Boost via Parameter-Free Optimization

    Authors: Daniil Medyakov, Sergey Stanko, Gleb Molodtsov, Philip Zmushko, Grigoriy Evseev, Egor Petrov, Aleksandr Beznosikov

    Abstract: Quite recently, large language models have made a significant breakthrough across various disciplines. However, training them is an extremely resource-intensive task, even for major players with vast computing resources. One of the methods gaining popularity in light of these challenges is Sign-SGD. This method can be applied both as a memory-efficient approach in single-node training and as a gra… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 58 pages, 5 figures, 5 tables

  3. arXiv:2506.02724  [pdf, ps, other

    cs.LG math.OC

    WeightLoRA: Keep Only Necessary Adapters

    Authors: Andrey Veprikov, Vladimir Solodkin, Alexander Zyl, Andrey Savchenko, Aleksandr Beznosikov

    Abstract: The widespread utilization of language models in modern applications is inconceivable without Parameter-Efficient Fine-Tuning techniques, such as low-rank adaptation ($\texttt{LoRA}$), which adds trainable adapters to selected layers. Although $\texttt{LoRA}$ may obtain accurate solutions, it requires significant memory to train large models and intuition on which layers to add adapters. In this p… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 13 pages, 9 tables

  4. arXiv:2505.23510  [pdf, ps, other

    math.OC

    Incorporating Preconditioning into Accelerated Approaches: Theoretical Guarantees and Practical Improvement

    Authors: Stepan Trifonov, Leonid Levin, Savelii Chezhegov, Aleksandr Beznosikov

    Abstract: Machine learning and deep learning are widely researched fields that provide solutions to many modern problems. Due to the complexity of new problems related to the size of datasets, efficient approaches are obligatory. In optimization theory, the Heavy Ball and Nesterov methods use \textit{momentum} in their updates of model weights. On the other hand, the minimization problems considered may be… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 18 pages, 2 figures

  5. arXiv:2505.20817  [pdf, ps, other

    math.OC cs.LG

    Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise

    Authors: Savelii Chezhegov, Aleksandr Beznosikov, Samuel Horváth, Eduard Gorbunov

    Abstract: Gradient clipping is a widely used technique in Machine Learning and Deep Learning (DL), known for its effectiveness in mitigating the impact of heavy-tailed noise, which frequently arises in the training of large language models. Additionally, first-order methods with clipping, such as Clip-SGD, exhibit stronger convergence guarantees than SGD under the $(L_0,L_1)$-smoothness assumption, a proper… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 33 pages

  6. arXiv:2505.07614  [pdf, ps, other

    cs.LG math.OC

    Trial and Trust: Addressing Byzantine Attacks with Comprehensive Defense Strategy

    Authors: Gleb Molodtsov, Daniil Medyakov, Sergey Skorik, Nikolas Khachaturov, Shahane Tigranyan, Vladimir Aletov, Aram Avetisyan, Martin Takáč, Aleksandr Beznosikov

    Abstract: Recent advancements in machine learning have improved performance while also increasing computational demands. While federated and distributed setups address these issues, their structure is vulnerable to malicious influences. In this paper, we address a specific threat, Byzantine attacks, where compromised clients inject adversarial updates to derail global convergence. We combine the trust score… ▽ More

    Submitted 9 June, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  7. arXiv:2502.14648  [pdf, other

    cs.LG math.OC

    Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling

    Authors: Daniil Medyakov, Gleb Molodtsov, Savelii Chezhegov, Alexey Rebrikov, Aleksandr Beznosikov

    Abstract: In today's world, machine learning is hard to imagine without large training datasets and models. This has led to the use of stochastic methods for training, such as stochastic gradient descent (SGD). SGD provides weak theoretical guarantees of convergence, but there are modifications, such as Stochastic Variance Reduced Gradient (SVRG) and StochAstic Recursive grAdient algoritHm (SARAH), that can… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 30 pages, 6 figures, 1 table

  8. arXiv:2502.07923  [pdf, other

    math.OC cs.LG

    Sign Operator for Coping with Heavy-Tailed Noise in Non-Convex Optimization: High Probability Bounds Under $(L_0, L_1)$-Smoothness

    Authors: Nikita Kornilov, Philip Zmushko, Andrei Semenov, Mark Ikonnikov, Alexander Gasnikov, Alexander Beznosikov

    Abstract: In recent years, non-convex optimization problems are more often described by generalized $(L_0, L_1)$-smoothness assumption rather than standard one. Meanwhile, severely corrupted data used in these problems has increased the demand for methods capable of handling heavy-tailed noises, i.e., noises with bounded $κ$-th moment. Motivated by these real-world trends and challenges, we explore sign-bas… ▽ More

    Submitted 27 May, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

  9. arXiv:2412.16414  [pdf, other

    math.OC cs.DC cs.LG

    Accelerated Methods with Compressed Communications for Distributed Optimization Problems under Data Similarity

    Authors: Dmitry Bylinkin, Aleksandr Beznosikov

    Abstract: In recent years, as data and problem sizes have increased, distributed learning has become an essential tool for training high-performance models. However, the communication bottleneck, especially for high-dimensional data, is a challenge. Several techniques have been developed to overcome this problem. These include communication compression and implementation of local steps, which work particula… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: Accepted at AAAI25, 31 pages, 108 figures, 9 appendices

  10. arXiv:2412.14935  [pdf, other

    math.OC

    Effective Method with Compression for Distributed and Federated Cocoercive Variational Inequalities

    Authors: Daniil Medyakov, Gleb Molodtsov, Aleksandr Beznosikov

    Abstract: Variational inequalities as an effective tool for solving applied problems, including machine learning tasks, have been attracting more and more attention from researchers in recent years. The use of variational inequalities covers a wide range of areas - from reinforcement learning and generative models to traditional applications in economics and game theory. At the same time, it is impossible t… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: In Russian

  11. arXiv:2409.14280  [pdf, other

    math.OC cs.LG

    Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning

    Authors: Dmitry Bylinkin, Kirill Degtyarev, Aleksandr Beznosikov

    Abstract: Modern realities and trends in learning require more and more generalization ability of models, which leads to an increase in both models and training sample size. It is already difficult to solve such tasks in a single device mode. This is the reason why distributed and federated learning approaches are becoming more popular every day. Distributed computing involves communication between devices,… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

    Comments: 25 pages, 15 figures, 4 appendices

  12. arXiv:2409.13428  [pdf, other

    math.OC

    Methods for Solving Variational Inequalities with Markovian Stochasticity

    Authors: Vladimir Solodkin, Michael Ermoshin, Roman Gavrilenko, Aleksandr Beznosikov

    Abstract: In this paper, we present a novel stochastic method for solving variational inequalities (VI) in the context of Markovian noise. By leveraging Extragradient technique, we can productively solve VI optimization problems characterized by Markovian dynamics. We demonstrate the efficacy of proposed method through rigorous theoretical analysis, proving convergence under quite mild assumptions of $L$-Li… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  13. Local SGD for Near-Quadratic Problems: Improving Convergence under Unconstrained Noise Conditions

    Authors: Andrey Sadchikov, Savelii Chezhegov, Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: Distributed optimization plays an important role in modern large-scale machine learning and data processing systems by optimizing the utilization of computational resources. One of the classical and popular approaches is Local Stochastic Gradient Descent (Local SGD), characterized by multiple local updates before averaging, which is particularly useful in distributed environments to reduce communi… ▽ More

    Submitted 18 December, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: 33 pages, 1 algorithm, 3 figures, 2 tables

  14. New Aspects of Black Box Conditional Gradient: Variance Reduction and One Point Feedback

    Authors: Andrey Veprikov, Aleksandr Bogdanov, Vladislav Minashkin, Aleksandr Beznosikov

    Abstract: This paper deals with the black-box optimization problem. In this setup, we do not have access to the gradient of the objective function, therefore, we need to estimate it somehow. We propose a new type of approximation JAGUAR, that memorizes information from previous iterations and requires $\mathcal{O}(1)$ oracle calls. We implement this approximation in the Frank-Wolfe and Gradient Descent algo… ▽ More

    Submitted 17 September, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: 29 pages, 5 algorithms, 3 figures, 1 table

  15. Method with Batching for Stochastic Finite-Sum Variational Inequalities in Non-Euclidean Setting

    Authors: Alexander Pichugin, Maksim Pechin, Aleksandr Beznosikov, Vasilii Novitskii, Alexander Gasnikov

    Abstract: Variational inequalities are a universal optimization paradigm that incorporate classical minimization and saddle point problems. Nowadays more and more tasks require to consider stochastic formulations of optimization problems. In this paper, we present an analysis of a method that gives optimal convergence estimates for monotone stochastic finite-sum variational inequalities. In contrast to the… ▽ More

    Submitted 15 September, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 38 pages, 1 algorithm, 4 figures, 1 table

  16. arXiv:2408.01848  [pdf, ps, other

    math.OC

    Methods for Optimization Problems with Markovian Stochasticity and Non-Euclidean Geometry

    Authors: Vladimir Solodkin, Andrew Veprikov, Aleksandr Beznosikov

    Abstract: This paper examines a variety of classical optimization problems, including well-known minimization tasks and more general variational inequalities. We consider a stochastic formulation of these problems, and unlike most previous work, we take into account the complex Markov nature of the noise. We also consider the geometry of the problem in an arbitrary non-Euclidean setting, and propose four me… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  17. arXiv:2406.06788  [pdf, other

    math.OC

    Stochastic Frank-Wolfe: Unified Analysis and Zoo of Special Cases

    Authors: Ruslan Nazykov, Aleksandr Shestakov, Vladimir Solodkin, Aleksandr Beznosikov, Gauthier Gidel, Alexander Gasnikov

    Abstract: The Conditional Gradient (or Frank-Wolfe) method is one of the most well-known methods for solving constrained optimization problems appearing in various machine learning tasks. The simplicity of iteration and applicability to many practical problems helped the method to gain popularity in the community. In recent years, the Frank-Wolfe algorithm received many different extensions, including stoch… ▽ More

    Submitted 15 September, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Appears in: The 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024). 42 pages, 13 algorithms, 8 figures, 3 tables. Reference: https://proceedings.mlr.press/v238/nazykov24a.html

  18. Accelerated Stochastic Gradient Method with Applications to Consensus Problem in Markov-Varying Networks

    Authors: Vladimir Solodkin, Savelii Chezhegov, Ruslan Nazikov, Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: Stochastic optimization is a vital field in the realm of mathematical optimization, finding applications in diverse areas ranging from operations research to machine learning. In this paper, we introduce a novel first-order optimization algorithm designed for scenarios where Markovian noise is present, incorporating Nesterov acceleration for enhanced efficiency. The convergence analysis is perform… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  19. arXiv:2406.04443  [pdf, other

    cs.LG math.OC

    Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed

    Authors: Savelii Chezhegov, Yaroslav Klyukin, Andrei Semenov, Aleksandr Beznosikov, Alexander Gasnikov, Samuel Horváth, Martin Takáč, Eduard Gorbunov

    Abstract: Methods with adaptive stepsizes, such as AdaGrad and Adam, are essential for training modern Deep Learning models, especially Large Language Models. Typically, the noise in the stochastic gradients is heavy-tailed for the later ones. Gradient clipping provably helps to achieve good high-probability convergence for such noises. However, despite the similarity between AdaGrad/Adam and Clip-SGD, the… ▽ More

    Submitted 13 March, 2025; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: 63 pages, 8 figures

  20. arXiv:2406.00846  [pdf, other

    cs.LG cs.DC math.OC

    Local Methods with Adaptivity via Scaling

    Authors: Savelii Chezhegov, Sergey Skorik, Nikolas Khachaturov, Danil Shalagin, Aram Avetisyan, Martin Takáč, Yaroslav Kholodov, Aleksandr Beznosikov

    Abstract: The rapid development of machine learning and deep learning has introduced increasingly complex optimization challenges that must be addressed. Indeed, training modern, advanced models has become difficult to implement without leveraging multiple computing nodes in a distributed environment. Distributed optimization is also fundamental to emerging fields such as federated learning. Specifically, t… ▽ More

    Submitted 16 September, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: 41 pages, 2 algorithms, 6 figures, 1 table

  21. arXiv:2404.13328  [pdf, other

    math.OC

    Accelerated Methods with Compression for Horizontal and Vertical Federated Learning

    Authors: Sergey Stanko, Timur Karimullin, Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: Distributed optimization algorithms have emerged as a superior approaches for solving machine learning problems. To accommodate the diverse ways in which data can be stored across devices, these methods must be adaptable to a wide range of situations. As a result, two orthogonal regimes of distributed algorithms are distinguished: horizontal and vertical. During parallel training, communication be… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  22. Extragradient Sliding for Composite Non-Monotone Variational Inequalities

    Authors: Roman Emelyanov, Andrey Tikhomirov, Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: Variational inequalities offer a versatile and straightforward approach to analyzing a broad range of equilibrium problems in both theoretical and practical fields. In this paper, we consider a composite generally non-monotone variational inequality represented as a sum of $L_q$-Lipschitz monotone and $L_p$-Lipschitz generally non-monotone operators. We applied a special sliding version of the cla… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 12 pages, 1 algorithm, 3 figures

  23. arXiv:2402.02490  [pdf, other

    math.OC

    Decentralized Finite-Sum Optimization over Time-Varying Networks

    Authors: Dmitry Metelev, Savelii Chezhegov, Alexander Rogozin, Aleksandr Beznosikov, Alexander Sholokhov, Alexander Gasnikov, Dmitry Kovalev

    Abstract: We consider decentralized time-varying stochastic optimization problems where each of the functions held by the nodes has a finite sum structure. Such problems can be efficiently solved using variance reduction techniques. Our aim is to explore the lower complexity bounds (for communication and number of stochastic oracle calls) and find optimal algorithms. The paper studies strongly convex and no… ▽ More

    Submitted 7 February, 2025; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: 48 pages, 2 figures, 2 tables

  24. Optimal Analysis of Method with Batching for Monotone Stochastic Finite-Sum Variational Inequalities

    Authors: Alexander Pichugin, Maksim Pechin, Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: Variational inequalities are a universal optimization paradigm that is interesting in itself, but also incorporates classical minimization and saddle point problems. Modern realities encourage to consider stochastic formulations of optimization problems. In this paper, we present an analysis of a method that gives optimal convergence estimates for monotone stochastic finite-sum variational inequal… ▽ More

    Submitted 26 March, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: 22 pages, 1 algorithm, 2 figures, 1 table

  25. Optimal Data Splitting in Distributed Optimization for Machine Learning

    Authors: Daniil Medyakov, Gleb Molodtsov, Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: The distributed optimization problem has become increasingly relevant recently. It has a lot of advantages such as processing a large amount of data in less time compared to non-distributed methods. However, most distributed approaches suffer from a significant bottleneck - the cost of communications. Therefore, a large amount of research has recently been directed at solving this problem. One suc… ▽ More

    Submitted 26 March, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: 17 pages, 2 figures

  26. arXiv:2401.07788  [pdf, other

    cs.LG cs.DC math.OC

    Activations and Gradients Compression for Model-Parallel Training

    Authors: Mikhail Rudakov, Aleksandr Beznosikov, Yaroslav Kholodov, Alexander Gasnikov

    Abstract: Large neural networks require enormous computational clusters of machines. Model-parallel training, when the model architecture is partitioned sequentially between workers, is a popular approach for training modern models. Information compression can be applied to decrease workers communication time, as it is often a bottleneck in such systems. This work explores how simultaneous compression of ac… ▽ More

    Submitted 26 March, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: 17 pages, 6 figures, 5 tables

  27. About some works of Boris Polyak on convergence of gradient methods and their development

    Authors: Seydamet Ablaev, Aleksandr Beznosikov, Alexander Gasnikov, Darina Dvinskikh, Aleksandr Lobanov, Sergei Puchinin, Fedor Stonyakin

    Abstract: The paper presents a review of the state-of-the-art of subgradient and accelerated methods of convex optimization, including in the presence of disturbances and access to various information about the objective function (function value, gradient, stochastic gradient, higher derivatives). For nonconvex problems, the Polak-Lojasiewicz condition is considered and a review of the main results is given… ▽ More

    Submitted 24 December, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: in Russian language

  28. arXiv:2311.06953  [pdf, other

    math.OC

    Bregman Proximal Method for Efficient Communications under Similarity

    Authors: Aleksandr Beznosikov, Darina Dvinskikh, Dmitry Bylinkin, Andrei Semenov, Alexander Gasnikov

    Abstract: We propose a novel stochastic distributed method for both monotone and strongly monotone variational inequalities with Lipschitz operator and proper convex regularizers arising in various applications from game theory to adversarial training. By exploiting similarity, our algorithm overcomes the communication bottleneck that is a major issue in distributed optimization. The proposed method enjoys… ▽ More

    Submitted 4 October, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

    Comments: 17 pages, 2 algorithms, 1 figure

  29. arXiv:2310.06081  [pdf, ps, other

    math.OC cs.LG math.PR stat.ML

    Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting

    Authors: Aleksei Ustimenko, Aleksandr Beznosikov

    Abstract: In this work, we consider rather general and broad class of Markov chains, Ito chains, that look like Euler-Maryama discretization of some Stochastic Differential Equation. The chain we study is a unified framework for theoretical analysis. It comes with almost arbitrary isotropic and state-dependent noise instead of normal and state-independent one as in most related papers. Moreover, in our chai… ▽ More

    Submitted 30 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Appears in: The Twelfth International Conference on Learning Representations (ICLR 2024). 27 pages, 3 tables. Reference: https://openreview.net/forum?id=fjpfCOV4ru

  30. Real Acceleration of Communication Process in Distributed Algorithms with Compression

    Authors: Svetlana Tkachenko, Artem Andreev, Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: Modern applied optimization problems become more and more complex every day. Due to this fact, distributed algorithms that can speed up the process of solving an optimization problem through parallelization are of great importance. The main bottleneck of distributed algorithms is communications, which can slow down the method dramatically. One way to solve this issue is to use compression of trans… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: 11 pages

  31. arXiv:2307.12946  [pdf, ps, other

    math.OC

    Optimal Algorithm with Complexity Separation for Strongly Convex-Strongly Concave Composite Saddle Point Problems

    Authors: Ekaterina Borodich, Georgiy Kormakov, Dmitry Kovalev, Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: In this work, we focuses on the following saddle point problem $\min_x \max_y p(x) + R(x,y) - q(y)$ where $R(x,y)$ is $L_R$-smooth, $μ_x$-strongly convex, $μ_y$-strongly concave and $p(x), q(y)$ are convex and $L_p, L_q$-smooth respectively. We present a new algorithm with optimal overall complexity… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: work in progress

  32. Decentralized Optimization Over Slowly Time-Varying Graphs: Algorithms and Lower Bounds

    Authors: Dmitry Metelev, Aleksandr Beznosikov, Alexander Rogozin, Alexander Gasnikov, Anton Proskurnikov

    Abstract: We consider a decentralized convex unconstrained optimization problem, where the cost function can be decomposed into a sum of strongly convex and smooth functions, associated with individual agents, interacting over a static or time-varying network. Our main concern is the convergence rate of first-order optimization algorithms as a function of the network's graph, more specifically, of the condi… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  33. Non-Smooth Setting of Stochastic Decentralized Convex Optimization Problem Over Time-Varying Graphs

    Authors: Aleksandr Lobanov, Andrew Veprikov, Georgiy Konin, Aleksandr Beznosikov, Alexander Gasnikov, Dmitry Kovalev

    Abstract: Distributed optimization has a rich history. It has demonstrated its effectiveness in many machine learning applications, etc. In this paper we study a subclass of distributed optimization, namely decentralized optimization in a non-smooth setting. Decentralized means that $m$ agents (machines) working in parallel on one problem communicate only with the neighbors agents (machines), i.e. there is… ▽ More

    Submitted 5 September, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: text overlap with arXiv:2106.04469

  34. arXiv:2305.15938  [pdf, ps, other

    math.OC cs.LG stat.ML

    First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

    Authors: Aleksandr Beznosikov, Sergey Samsonov, Marina Sheshukova, Alexander Gasnikov, Alexey Naumov, Eric Moulines

    Abstract: This paper delves into stochastic optimization problems that involve Markovian noise. We present a unified approach for the theoretical analysis of first-order gradient methods for stochastic optimization and variational inequalities. Our approach covers scenarios for both non-convex and strongly convex minimization problems. To achieve an optimal (linear) dependence on the mixing time of the unde… ▽ More

    Submitted 30 March, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Appears in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023). 41 pages, 3 algorithms, 2 tables

    Journal ref: https://proceedings.neurips.cc/paper_files/paper/2023/hash/8c3e38ce55a0fa44bc325bc6fdb7f4e5-Abstract-Conference.html

  35. arXiv:2304.11737  [pdf, other

    math.OC cs.LG stat.ML

    Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features

    Authors: Aleksandr Beznosikov, David Dobre, Gauthier Gidel

    Abstract: The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints that arise in machine learning applications. In recent years, stochastic versions of FW have gained popularity, motivated by large datasets for which the computation of the full gradient is prohibitively expensive. In this paper, we present two new variants of the FW algorithms for stoch… ▽ More

    Submitted 15 September, 2024; v1 submitted 23 April, 2023; originally announced April 2023.

    Comments: Appears in: the 41st International Conference on Machine Learning (ICML 2024). 26 pages, 2 algorithms, 5 figures, 2 tables. Reference: https://proceedings.mlr.press/v235/beznosikov24a.html

  36. arXiv:2302.07615  [pdf, other

    math.OC cs.DC cs.GT cs.LG stat.ML

    Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities

    Authors: Aleksandr Beznosikov, Martin Takáč, Alexander Gasnikov

    Abstract: Variational inequalities are a broad and flexible class of problems that includes minimization, saddle point, and fixed point problems as special cases. Therefore, variational inequalities are used in various applications ranging from equilibrium search to adversarial learning. With the increasing size of data and models, today's instances demand parallel and distributed computing for real-world m… ▽ More

    Submitted 30 March, 2024; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: Appears in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023) (https://proceedings.neurips.cc/paper_files/paper/2023/hash/5b4a459db23e6db9be2a128380953d96-Abstract-Conference.html). 36 pages, 3 algorithms, 1 figure, 1 table

  37. Randomized gradient-free methods in convex optimization

    Authors: Alexander Gasnikov, Darina Dvinskikh, Pavel Dvurechensky, Eduard Gorbunov, Aleksander Beznosikov, Aleksandr Lobanov

    Abstract: This review presents modern gradient-free methods to solve convex optimization problems. By gradient-free methods, we mean those that use only (noisy) realizations of the objective value. We are motivated by various applications where gradient information is prohibitively expensive or even unavailable. We mainly focus on three criteria: oracle complexity, iteration complexity, and the maximum perm… ▽ More

    Submitted 12 February, 2024; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: Survey paper; 9 pages

  38. Decentralized convex optimization over time-varying graphs: a survey

    Authors: Alexander Rogozin, Alexander Gasnikov, Aleksander Beznosikov, Dmitry Kovalev

    Abstract: Decentralized optimization over time-varying networks has a wide range of applications in distributed learning, signal processing and various distributed control problems. The agents of the distributed system locally hold optimization objectives and can communicate to their immediate neighbors over a network that changes from time to time. In this paper, we survey state-of-the-art results and desc… ▽ More

    Submitted 17 April, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

  39. SARAH-based Variance-reduced Algorithm for Stochastic Finite-sum Cocoercive Variational Inequalities

    Authors: Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: Variational inequalities are a broad formalism that encompasses a vast number of applications. Motivated by applications in machine learning and beyond, stochastic methods are of great importance. In this paper we consider the problem of stochastic finite-sum cocoercive variational inequalities. For this class of problems, we investigate the convergence of the method based on the SARAH variance re… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: 11 pages, 1 algorithm, 1 figure, 1 theorem

  40. arXiv:2208.13592  [pdf, ps, other

    math.OC cs.GT cs.LG stat.ML

    Smooth Monotone Stochastic Variational Inequalities and Saddle Point Problems: A Survey

    Authors: Aleksandr Beznosikov, Boris Polyak, Eduard Gorbunov, Dmitry Kovalev, Alexander Gasnikov

    Abstract: This paper is a survey of methods for solving smooth (strongly) monotone stochastic variational inequalities. To begin with, we give the deterministic foundation from which the stochastic methods eventually evolved. Then we review methods for the general stochastic formulation, and look at the finite sum setup. The last parts of the paper are devoted to various recent (not necessarily stochastic)… ▽ More

    Submitted 2 April, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: 12 pages

  41. Compression and Data Similarity: Combination of Two Techniques for Communication-Efficient Solving of Distributed Variational Inequalities

    Authors: Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: Variational inequalities are an important tool, which includes minimization, saddles, games, fixed-point problems. Modern large-scale and computationally expensive practical applications make distributed methods for solving these problems popular. Meanwhile, most distributed systems have a basic problem - a communication bottleneck. There are various techniques to deal with it. In particular, in t… ▽ More

    Submitted 3 September, 2022; v1 submitted 19 June, 2022; originally announced June 2022.

    Comments: v2: minor changes. 19 pages, 1 algorithm, 1 figure, 1 table, 1 theorem

  42. arXiv:2206.08303  [pdf, other

    cs.LG math.OC

    On Scaled Methods for Saddle Point Problems

    Authors: Aleksandr Beznosikov, Aibek Alanov, Dmitry Kovalev, Martin Takáč, Alexander Gasnikov

    Abstract: Methods with adaptive scaling of different features play a key role in solving saddle point problems, primarily due to Adam's popularity for solving adversarial machine learning problems, including GANS training. This paper carries out a theoretical analysis of the following scaling techniques for solving SPPs: the well-known Adam and RmsProp scaling and the newer AdaHessian and OASIS based on Hut… ▽ More

    Submitted 21 June, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: 54 pages, 2 algorithms with 4 options for each, 12 figures, 5 tables, 2 theorems

  43. Stochastic Gradient Methods with Preconditioned Updates

    Authors: Abdurakhmon Sadiev, Aleksandr Beznosikov, Abdulla Jasem Almansoori, Dmitry Kamzolov, Rachael Tappenden, Martin Takáč

    Abstract: This work considers the non-convex finite sum minimization problem. There are several algorithms for such problems, but existing methods often work poorly when the problem is badly scaled and/or ill-conditioned, and a primary goal of this work is to introduce methods that alleviate this issue. Thus, here we include a preconditioner based on Hutchinson's approach to approximating the diagonal of th… ▽ More

    Submitted 14 January, 2024; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: 40 pages, 2 new algorithms, 20 figures, 4 tables

  44. arXiv:2205.15136  [pdf, other

    math.OC cs.DC cs.LG

    Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity

    Authors: Dmitry Kovalev, Aleksandr Beznosikov, Ekaterina Borodich, Alexander Gasnikov, Gesualdo Scutari

    Abstract: We study structured convex optimization problems, with additive objective $r:=p + q$, where $r$ is ($μ$-strongly) convex, $q$ is $L_q$-smooth and convex, and $p$ is $L_p$-smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of g… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: 24 pages, 2 new algorithms, 12 theorems, 2 figures

  45. arXiv:2202.07262  [pdf, other

    math.OC cs.LG

    Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods

    Authors: Aleksandr Beznosikov, Eduard Gorbunov, Hugo Berard, Nicolas Loizou

    Abstract: Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. The success of the method led to several advanced extensions of the classical SGDA, including variants with arbitrary sampling, variance reduction, coordinate randomization, and distributed varian… ▽ More

    Submitted 8 March, 2023; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: AISTATS 2023. 65 pages, 5 figures, 3 tables. Changes in v2: new results were added (Theorem 2.5 and its corollaries), few typos were fixed, more clarifications were added. Changes in v3: AISTATS formatting was applied, small clarifications were added. Code: https://github.com/hugobb/sgda

  46. arXiv:2202.02771  [pdf, other

    math.OC cs.DC cs.LG

    Optimal Algorithms for Decentralized Stochastic Variational Inequalities

    Authors: Dmitry Kovalev, Aleksandr Beznosikov, Abdurakhmon Sadiev, Michael Persiianov, Peter Richtárik, Alexander Gasnikov

    Abstract: Variational inequalities are a formalism that includes games, minimization, saddle point, and equilibrium problems as special cases. Methods for variational inequalities are therefore universal approaches for many applied tasks, including machine learning problems. This work concentrates on the decentralized setting, which is increasingly important but not well understood. In particular, we consid… ▽ More

    Submitted 2 April, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

    Comments: Appears in: Advances in Neural Information Processing Systems 35 (NeurIPS 2022). Minor modifications with respect to the NeurIPS version. 58 pages, 6 algorithms, 9 figures, 4 tables

    Journal ref: https://proceedings.neurips.cc/paper_files/paper/2022/hash/c959bb2cb164d37569a17fa67494d69a-Abstract-Conference.html

  47. arXiv:2201.12289  [pdf, other

    math.OC

    The Power of First-Order Smooth Optimization for Black-Box Non-Smooth Problems

    Authors: Alexander Gasnikov, Anton Novitskii, Vasilii Novitskii, Farshed Abdukhakimov, Dmitry Kamzolov, Aleksandr Beznosikov, Martin Takáč, Pavel Dvurechensky, Bin Gu

    Abstract: Gradient-free/zeroth-order methods for black-box convex optimization have been extensively studied in the last decade with the main focus on oracle calls complexity. In this paper, besides the oracle complexity, we focus also on iteration complexity, and propose a generic approach that, based on optimal first-order methods, allows to obtain in a black-box fashion new zeroth-order algorithms for no… ▽ More

    Submitted 1 March, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: Appears in: Proceedings of the 39th International Conference on Machine Learning (ICML 2022). Reference: https://proceedings.mlr.press/v162/gasnikov22a.html

  48. A Unified Analysis of Variational Inequality Methods: Variance Reduction, Sampling, Quantization and Coordinate Descent

    Authors: Aleksandr Beznosikov, Alexander Gasnikov, Karina Zainulina, Alexander Maslovskiy, Dmitry Pasechnyuk

    Abstract: In this paper, we present a unified analysis of methods for such a wide class of problems as variational inequalities, which includes minimization problems and saddle point problems. We develop our analysis on the modified Extra-Gradient method (the classic algorithm for variational inequalities) and consider the strongly monotone and monotone cases, which corresponds to strongly-convex-strongly-c… ▽ More

    Submitted 3 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: in Russian, 57 pages, 3 figures, 1 table

  49. Random-reshuffled SARAH does not need a full gradient computations

    Authors: Aleksandr Beznosikov, Martin Takáč

    Abstract: The StochAstic Recursive grAdient algoritHm (SARAH) algorithm is a variance reduced variant of the Stochastic Gradient Descent (SGD) algorithm that needs a gradient of the objective function from time to time. In this paper, we remove the necessity of a full gradient computation. This is achieved by using a randomized reshuffling strategy and aggregating stochastic gradients obtained in each epoch… ▽ More

    Submitted 14 January, 2024; v1 submitted 26 November, 2021; originally announced November 2021.

    Comments: 20 pages, 2 algorithms, 5 figures, 3 tables

  50. arXiv:2107.10706  [pdf, other

    math.OC cs.DC cs.LG

    Distributed Saddle-Point Problems Under Similarity

    Authors: Aleksandr Beznosikov, Gesualdo Scutari, Alexander Rogozin, Alexander Gasnikov

    Abstract: We study solution methods for (strongly-)convex-(strongly)-concave Saddle-Point Problems (SPPs) over networks of two type - master/workers (thus centralized) architectures and meshed (thus decentralized) networks. The local functions at each node are assumed to be similar, due to statistical data similarity or otherwise. We establish lower complexity bounds for a fairly general class of algorithms… ▽ More

    Submitted 22 August, 2022; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: Appears in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Minor modifications with respect to the NeurIPS version. 35 pages, 3 algorithms, 4 figures, 1 table

    Journal ref: https://proceedings.neurips.cc/paper/2021/hash/44e65d3e9bc2f88b2b3d566de51a5381-Abstract.html