-
Accelerated Bregman gradient methods for relatively smooth and relatively Lipschitz continuous minimization problems
Authors:
O. S. Savchuk,
M. S. Alkousa,
A. S. Shushko,
A. A. Vyguzov,
F. S. Stonyakin,
D. A. Pasechnyuk,
A. V. Gasnikov
Abstract:
In this paper, we propose some accelerated methods for solving optimization problems under the condition of relatively smooth and relatively Lipschitz continuous functions with an inexact oracle. We consider the problem of minimizing the convex differentiable and relatively smooth function concerning a reference convex function. The first proposed method is based on a similar triangles method with…
▽ More
In this paper, we propose some accelerated methods for solving optimization problems under the condition of relatively smooth and relatively Lipschitz continuous functions with an inexact oracle. We consider the problem of minimizing the convex differentiable and relatively smooth function concerning a reference convex function. The first proposed method is based on a similar triangles method with an inexact oracle, which uses a special triangular scaling property for the used Bregman divergence. The other proposed methods are non-adaptive and adaptive (tuning to the relative smoothness parameter) accelerated Bregman proximal gradient methods with an inexact oracle. These methods are universal in the sense that they are applicable not only to relatively smooth but also to relatively Lipschitz continuous optimization problems. We also introduced an adaptive intermediate Bregman method which interpolates between slower but more robust algorithms non-accelerated and faster, but less robust accelerated algorithms. We conclude the paper with the results of numerical experiments demonstrating the advantages of the proposed algorithms for the Poisson inverse problem.
△ Less
Submitted 23 November, 2024;
originally announced November 2024.
-
OPTAMI: Global Superlinear Convergence of High-order Methods
Authors:
Dmitry Kamzolov,
Dmitry Pasechnyuk,
Artem Agafonov,
Alexander Gasnikov,
Martin Takáč
Abstract:
Second-order methods for convex optimization outperform first-order methods in terms of theoretical iteration convergence, achieving rates up to $O(k^{-5})$ for highly-smooth functions. However, their practical performance and applications are limited due to their multi-level structure and implementation complexity. In this paper, we present new results on high-order optimization methods, supporte…
▽ More
Second-order methods for convex optimization outperform first-order methods in terms of theoretical iteration convergence, achieving rates up to $O(k^{-5})$ for highly-smooth functions. However, their practical performance and applications are limited due to their multi-level structure and implementation complexity. In this paper, we present new results on high-order optimization methods, supported by their practical performance. First, we show that the basic high-order methods, such as the Cubic Regularized Newton Method, exhibit global superlinear convergence for $μ$-strongly star-convex functions, a class that includes $μ$-strongly convex functions and some non-convex functions. Theoretical convergence results are both inspired and supported by the practical performance of these methods. Secondly, we propose a practical version of the Nesterov Accelerated Tensor method, called NATA. It significantly outperforms the classical variant and other high-order acceleration techniques in practice. The convergence of NATA is also supported by theoretical results. Finally, we introduce an open-source computational library for high-order methods, called OPTAMI. This library includes various methods, acceleration techniques, and subproblem solvers, all implemented as PyTorch optimizers, thereby facilitating the practical application of high-order methods to a wide range of optimization problems. We hope this library will simplify research and practical comparison of methods beyond first-order.
△ Less
Submitted 12 October, 2024; v1 submitted 5 October, 2024;
originally announced October 2024.
-
Convergence analysis of stochastic gradient descent with adaptive preconditioning for non-convex and convex functions
Authors:
Dmitrii A. Pasechnyuk,
Alexander Gasnikov,
Martin Takáč
Abstract:
Preconditioning is a crucial operation in gradient-based numerical optimisation. It helps decrease the local condition number of a function by appropriately transforming its gradient. For a convex function, where the gradient can be computed exactly, the optimal linear transformation corresponds to the inverse of the Hessian operator, while the optimal convex transformation is the convex conjugate…
▽ More
Preconditioning is a crucial operation in gradient-based numerical optimisation. It helps decrease the local condition number of a function by appropriately transforming its gradient. For a convex function, where the gradient can be computed exactly, the optimal linear transformation corresponds to the inverse of the Hessian operator, while the optimal convex transformation is the convex conjugate of the function. Different conditions result in variations of these dependencies. Practical algorithms often employ low-rank or stochastic approximations of the inverse Hessian matrix for preconditioning. However, theoretical guarantees for these algorithms typically lack a justification for the defining property of preconditioning. This paper presents a simple theoretical framework that demonstrates, given a smooth function and an available unbiased stochastic approximation of its gradient, that it is possible to refine the dependency of the convergence rate on the Lipschitz constant of the gradient.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
Adaptive Methods or Variational Inequalities with Relatively Smooth and Reletively Strongly Monotone Operators
Authors:
S. S. Ablaev,
F. S. Stonyakin,
M. S. Alkousa,
D. A. Pasechnyuk
Abstract:
The article is devoted to some adaptive methods for variational inequalities with relatively smooth and relatively strongly monotone operators. Starting from the recently proposed proximal variant of the extragradient method for this class of problems, we investigate in detail the method with adaptively selected parameter values. An estimate of the convergence rate of this method is proved. The re…
▽ More
The article is devoted to some adaptive methods for variational inequalities with relatively smooth and relatively strongly monotone operators. Starting from the recently proposed proximal variant of the extragradient method for this class of problems, we investigate in detail the method with adaptively selected parameter values. An estimate of the convergence rate of this method is proved. The result is generalized to a class of variational inequalities with relatively strongly monotone generalized smooth variational inequality operators. Numerical experiments have been performed for the problem of ridge regression and variational inequality associated with box-simplex games.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Adaptive Variant of the Frank-Wolfe Algorithm for Convex Optimization Problems
Authors:
G. V. Aivazian,
F. S. Stonyakin,
D. A. Pasechnyuk,
M. S. Alkousa,
A. M. Raigorodskii
Abstract:
Some variant of the Frank-Wolfe method for convex optimization problems with adaptive selection of the step parameter corresponding to information about the smoothness of the objective function (the Lipschitz constant of the gradient). Theoretical estimates of the quality of the solution provided by the method are obtained in terms of adaptively selected parameters L_k. An important feature of the…
▽ More
Some variant of the Frank-Wolfe method for convex optimization problems with adaptive selection of the step parameter corresponding to information about the smoothness of the objective function (the Lipschitz constant of the gradient). Theoretical estimates of the quality of the solution provided by the method are obtained in terms of adaptively selected parameters L_k. An important feature of the obtained result is the elaboration of a situation in which it is possible to guarantee, after the completion of the iteration, a reduction of the discrepancy in the function by at least 2 times. At the same time, using of adaptively selected parameters in theoretical estimates makes it possible to apply the method for both smooth and nonsmooth problems, provided that the exit criterion from the iteration is met. For smooth problems, this can be proved, and the theoretical estimates of the method are guaranteed to be optimal up to multiplication by a constant factor. Computational experiments were performed, and a comparison with two other algorithms was carried out, during which the efficiency of the algorithm was demonstrated for a number of both smooth and non-smooth problems.
△ Less
Submitted 29 July, 2023;
originally announced July 2023.
-
Primal-Dual Gradient Methods for Searching Network Equilibria in Combined Models with Nested Choice Structure and Capacity Constraints
Authors:
Meruza Kubentayeva,
Demyan Yarmoshik,
Mikhail Persiianov,
Alexey Kroshnin,
Ekaterina Kotliarova,
Nazarii Tupitsa,
Dmitry Pasechnyuk,
Alexander Gasnikov,
Vladimir Shvetsov,
Leonid Baryshev,
Alexey Shurupov
Abstract:
We consider a network equilibrium model (i.e. a combined model), which was proposed as an alternative to the classic four-step approach for travel forecasting in transportation networks. This model can be formulated as a convex minimization program. We extend the combined model to the case of the stable dynamics (SD) model in the traffic assignment stage, which imposes strict capacity constraints…
▽ More
We consider a network equilibrium model (i.e. a combined model), which was proposed as an alternative to the classic four-step approach for travel forecasting in transportation networks. This model can be formulated as a convex minimization program. We extend the combined model to the case of the stable dynamics (SD) model in the traffic assignment stage, which imposes strict capacity constraints in the network. We propose a way to solve corresponding dual optimization problems with accelerated gradient methods and give theoretical guarantees of their convergence. We conducted numerical experiments with considered optimization methods on Moscow and Berlin networks.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Algorithms for Euclidean-regularised Optimal Transport
Authors:
Dmitry A. Pasechnyuk,
Michael Persiianov,
Pavel Dvurechensky,
Alexander Gasnikov
Abstract:
This paper addresses the Optimal Transport problem, which is regularized by the square of Euclidean $\ell_2$-norm. It offers theoretical guarantees regarding the iteration complexities of the Sinkhorn--Knopp algorithm, Accelerated Gradient Descent, Accelerated Alternating Minimisation, and Coordinate Linear Variance Reduction algorithms. Furthermore, the paper compares the practical efficiency of…
▽ More
This paper addresses the Optimal Transport problem, which is regularized by the square of Euclidean $\ell_2$-norm. It offers theoretical guarantees regarding the iteration complexities of the Sinkhorn--Knopp algorithm, Accelerated Gradient Descent, Accelerated Alternating Minimisation, and Coordinate Linear Variance Reduction algorithms. Furthermore, the paper compares the practical efficiency of these methods and their counterparts when applied to the entropy-regularized Optimal Transport problem. This comparison is conducted through numerical experiments carried out on the MNIST dataset.
△ Less
Submitted 28 August, 2023; v1 submitted 1 July, 2023;
originally announced July 2023.
-
Applying language models to algebraic topology: generating simplicial cycles using multi-labeling in Wu's formula
Authors:
Kirill Brilliantov,
Fedor Pavutnitskiy,
Dmitry Pasechnyuk,
German Magai
Abstract:
Computing homotopy groups of spheres has long been a fundamental objective in algebraic topology. Various theoretical and algorithmic approaches have been developed to tackle this problem. In this paper we take a step towards the goal of comprehending the group-theoretic structure of the generators of these homotopy groups by leveraging the power of machine learning. Specifically, in the simplicia…
▽ More
Computing homotopy groups of spheres has long been a fundamental objective in algebraic topology. Various theoretical and algorithmic approaches have been developed to tackle this problem. In this paper we take a step towards the goal of comprehending the group-theoretic structure of the generators of these homotopy groups by leveraging the power of machine learning. Specifically, in the simplicial group setting of Wu's formula, we reformulate the problem of generating simplicial cycles as a problem of sampling from the intersection of algorithmic datasets related to Dyck languages. We present and evaluate language modelling approaches that employ multi-label information for input sequences, along with the necessary group-theoretic toolkit and non-neural baselines.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Upper bounds on the maximum admissible level of noise in zeroth-order optimisation
Authors:
Dmitrii A. Pasechnyuk,
Aleksandr Lobanov,
Alexander Gasnikov
Abstract:
In this paper, we leverage an information-theoretic upper bound on the maximum admissible level of noise (MALN) in convex Lipschitz-continuous zeroth-order optimisation to establish corresponding upper bounds for classes of strongly convex and smooth problems. We derive these bounds through non-constructive proofs via optimal reductions. Furthermore, we demonstrate that by employing a one-dimensio…
▽ More
In this paper, we leverage an information-theoretic upper bound on the maximum admissible level of noise (MALN) in convex Lipschitz-continuous zeroth-order optimisation to establish corresponding upper bounds for classes of strongly convex and smooth problems. We derive these bounds through non-constructive proofs via optimal reductions. Furthermore, we demonstrate that by employing a one-dimensional grid-search algorithm, one can devise an algorithm for simplex-constrained optimisation that offers a superior upper bound on the MALN compared to the case of ball-constrained optimisation and estimates asymptotic in dimensionality.
△ Less
Submitted 28 October, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
A randomised non-descent method for global optimisation
Authors:
Dmitry A. Pasechnyuk,
Alexander Gornov
Abstract:
This paper proposes novel algorithm for non-convex multimodal constrained optimisation problems. It is based on sequential solving restrictions of problem to sections of feasible set by random subspaces (in general, manifolds) of low dimensionality. This approach varies in a way to draw subspaces, dimensionality of subspaces, and method to solve restricted problems. We provide empirical study of a…
▽ More
This paper proposes novel algorithm for non-convex multimodal constrained optimisation problems. It is based on sequential solving restrictions of problem to sections of feasible set by random subspaces (in general, manifolds) of low dimensionality. This approach varies in a way to draw subspaces, dimensionality of subspaces, and method to solve restricted problems. We provide empirical study of algorithm on convex, unimodal and multimodal optimisation problems and compare it with efficient algorithms intended for each class of problems.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
A Damped Newton Method Achieves Global $O\left(\frac{1}{k^2}\right)$ and Local Quadratic Convergence Rate
Authors:
Slavomír Hanzely,
Dmitry Kamzolov,
Dmitry Pasechnyuk,
Alexander Gasnikov,
Peter Richtárik,
Martin Takáč
Abstract:
In this paper, we present the first stepsize schedule for Newton method resulting in fast global and local convergence guarantees. In particular, a) we prove an $O\left( \frac 1 {k^2} \right)$ global rate, which matches the state-of-the-art global rate of cubically regularized Newton method of Polyak and Nesterov (2006) and of regularized Newton method of Mishchenko (2021) and Doikov and Nesterov…
▽ More
In this paper, we present the first stepsize schedule for Newton method resulting in fast global and local convergence guarantees. In particular, a) we prove an $O\left( \frac 1 {k^2} \right)$ global rate, which matches the state-of-the-art global rate of cubically regularized Newton method of Polyak and Nesterov (2006) and of regularized Newton method of Mishchenko (2021) and Doikov and Nesterov (2021), b) we prove a local quadratic rate, which matches the best-known local rate of second-order methods, and c) our stepsize formula is simple, explicit, and does not require solving any subproblem. Our convergence proofs hold under affine-invariance assumptions closely related to the notion of self-concordance. Finally, our method has competitive performance when compared to existing baselines, which share the same fast global convergence guarantees.
△ Less
Submitted 31 October, 2022;
originally announced November 2022.
-
Effects of momentum scaling for SGD
Authors:
Dmitry A. Pasechnyuk,
Alexander Gasnikov,
Martin Takáč
Abstract:
The paper studies the properties of stochastic gradient methods with preconditioning. We focus on momentum updated preconditioners with momentum coefficient $β$. Seeking to explain practical efficiency of scaled methods, we provide convergence analysis in a norm associated with preconditioner, and demonstrate that scaling allows one to get rid of gradients Lipschitz constant in convergence rates.…
▽ More
The paper studies the properties of stochastic gradient methods with preconditioning. We focus on momentum updated preconditioners with momentum coefficient $β$. Seeking to explain practical efficiency of scaled methods, we provide convergence analysis in a norm associated with preconditioner, and demonstrate that scaling allows one to get rid of gradients Lipschitz constant in convergence rates. Along the way, we emphasize important role of $β$, undeservedly set to constant $0.99...9$ at the arbitrariness of various authors. Finally, we propose the explicit constructive formulas for adaptive $β$ and step size values.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
A Unified Analysis of Variational Inequality Methods: Variance Reduction, Sampling, Quantization and Coordinate Descent
Authors:
Aleksandr Beznosikov,
Alexander Gasnikov,
Karina Zainulina,
Alexander Maslovskiy,
Dmitry Pasechnyuk
Abstract:
In this paper, we present a unified analysis of methods for such a wide class of problems as variational inequalities, which includes minimization problems and saddle point problems. We develop our analysis on the modified Extra-Gradient method (the classic algorithm for variational inequalities) and consider the strongly monotone and monotone cases, which corresponds to strongly-convex-strongly-c…
▽ More
In this paper, we present a unified analysis of methods for such a wide class of problems as variational inequalities, which includes minimization problems and saddle point problems. We develop our analysis on the modified Extra-Gradient method (the classic algorithm for variational inequalities) and consider the strongly monotone and monotone cases, which corresponds to strongly-convex-strongly-concave and convex-concave saddle point problems. The theoretical analysis is based on parametric assumptions about Extra-Gradient iterations. Therefore, it can serve as a strong basis for combining the already existing type methods and also for creating new algorithms. In particular, to show this we develop new robust methods, which include methods with quantization, coordinate methods, distributed randomized local methods, and others. Most of these approaches have never been considered in the generality of variational inequalities and have previously been used only for minimization problems. The robustness of the new methods is also confirmed by numerical experiments with GANs.
△ Less
Submitted 3 February, 2022; v1 submitted 28 January, 2022;
originally announced January 2022.
-
Stochastic optimization in digital pre-distortion of the signal
Authors:
A. V. Alpatov,
E. A. Peters,
D. A. Pasechnyuk,
A. M. Raigorodskii
Abstract:
In this paper, we test the performance of some modern stochastic optimization methods and practices in application to digital pre-distortion problem, that is a valuable part of processing signal on base stations providing wireless communication. In first part of our study, we focus on search of the best performing method and its proper modifications. In the second part, we proposed the new, quasi-…
▽ More
In this paper, we test the performance of some modern stochastic optimization methods and practices in application to digital pre-distortion problem, that is a valuable part of processing signal on base stations providing wireless communication. In first part of our study, we focus on search of the best performing method and its proper modifications. In the second part, we proposed the new, quasi-online, testing framework that allows us to fit our modelling results with the behaviour of real-life DPD prototype, retested some selected of practices considered in previous section and approved the advantages of the method occured to be the best in real-life conditions. For the used model, maximum achieved improvement in depth was 7% in standard regime and 5% in online one (metric itself is of logarithmic scale). We also achieved a halving of the working time preserving 3% and 6% improvement in depth for the standard and online regime, correspondingly. All comparisons are made to the Adam method, which was highlighted as the best stochastic method for DPD problem in paper [Pasechnyuk et al., 2021], and to the Adamax method, that is the best in the proposed online regime.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Network utility maximization by updating individual transmission rates
Authors:
Dmitry Pasechnyuk
Abstract:
This paper discusses the problem of maximizing the total data transmission utility of the computer network. The total utility is defined as the sum of the individual (corresponding to each node in the network) utilities that are concave functions of the data transmission rate. For the case of non-strongly concave utilities, we propose an approach based on the use of a fast gradient method to optim…
▽ More
This paper discusses the problem of maximizing the total data transmission utility of the computer network. The total utility is defined as the sum of the individual (corresponding to each node in the network) utilities that are concave functions of the data transmission rate. For the case of non-strongly concave utilities, we propose an approach based on the use of a fast gradient method to optimize a dually smoothed objective function. As an alternative approach, we introduce stochastic oracles for the problem under consideration and interpret them as the messages on the state of some individual node to use randomized switching mirror descent to solve the problem above. We propose interpretations of both described approaches allowing the effective implementation of the protocols of their operation in the real-life computer networks environment, taking into account the distributed information storage and the restricted communication capabilities. The numerical experiments were carried out to compare the proposed approaches on sythetic examples of network architectures.
△ Less
Submitted 26 June, 2021;
originally announced June 2021.
-
Stochastic optimization for dynamic pricing
Authors:
Dmitry Pasechnyuk,
Pavel Dvurechensky,
Sergey Omelchenko,
Alexander Gasnikov
Abstract:
We consider the problem of supply and demand balancing that is stated as a minimization problem for the total expected revenue function describing the behavior of both consumers and suppliers. In the considered market model we assume that consumers follow the discrete choice demand model, while suppliers are equipped with some quantity adjustment costs. The resulting optimization problem is smooth…
▽ More
We consider the problem of supply and demand balancing that is stated as a minimization problem for the total expected revenue function describing the behavior of both consumers and suppliers. In the considered market model we assume that consumers follow the discrete choice demand model, while suppliers are equipped with some quantity adjustment costs. The resulting optimization problem is smooth and convex making it amenable for application of efficient optimization algorithms with the aim of automatically setting prices for online marketplaces. We propose to use stochastic gradient methods to solve the above problem. We interpret the stochastic oracle as a response to the behavior of a random market participant, consumer or supplier. This allows us to interpret the considered algorithms and describe a suitable behavior of consumers and suppliers that leads to fast convergence to the equilibrium in a close to the real marketplace environment.
△ Less
Submitted 26 June, 2021;
originally announced June 2021.
-
Non-convex optimization in digital pre-distortion of the signal
Authors:
Dmitry Pasechnyuk,
Alexander Maslovskiy,
Alexander Gasnikov,
Anton Anikin,
Alexander Rogozin,
Alexander Gornov,
Andrey Vorobyev,
Eugeniy Yanitskiy,
Lev Antonov,
Roman Vlasov,
Anna Nikolaeva,
Maria Begicheva
Abstract:
In this paper, we give some observation of applying modern optimization methods for functionals describing digital predistortion (DPD) of signals with orthogonal frequency division multiplexing (OFDM) modulation. The considered family of model functionals is determined by the class of cascade Wiener--Hammerstein models, which can be represented as a computational graph consisting of various nonlin…
▽ More
In this paper, we give some observation of applying modern optimization methods for functionals describing digital predistortion (DPD) of signals with orthogonal frequency division multiplexing (OFDM) modulation. The considered family of model functionals is determined by the class of cascade Wiener--Hammerstein models, which can be represented as a computational graph consisting of various nonlinear blocks. To assess optimization methods with the best convergence depth and rate as a properties of this models family we multilaterally consider modern techniques used in optimizing neural networks and numerous numerical methods used to optimize non-convex multimodal functions. The research emphasizes the most effective of the considered techniques and describes several useful observations about the model properties and optimization methods behavior.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
On the Computational Efficiency of Catalyst Accelerated Coordinate Descent
Authors:
Dmitry Pasechnyuk,
Vladislav Matyukhin
Abstract:
This article is devoted to one particular case of using universal accelerated proximal envelopes to obtain computationally efficient accelerated versions of methods used to solve various optimization problem setups. We propose a proximally accelerated coordinate descent method that achieves the efficient algorithmic complexity of iteration and allows taking advantage of the data sparseness. It was…
▽ More
This article is devoted to one particular case of using universal accelerated proximal envelopes to obtain computationally efficient accelerated versions of methods used to solve various optimization problem setups. We propose a proximally accelerated coordinate descent method that achieves the efficient algorithmic complexity of iteration and allows taking advantage of the data sparseness. It was considered an example of applying the proposed approach to optimizing a SoftMax-like function, for which the described method allowing weaken the dependence of the computational complexity on the dimension $n$ in $\mathcal{O}(\sqrt{n})$ times and, in practice, demonstrates a faster convergence in comparison with standard methods. As an example of applying the proposed approach, it was shown a variant of obtaining on its basis some efficient methods for optimizing Markov Decision Processes (MDP) in a minimax formulation with a Nesterov smoothed target functional.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Accelerated Proximal Envelopes: Application to the Coordinate Descent Method
Authors:
Dmitry Pasechnyuk,
Anton Anikin,
Vladislav Matyukhin
Abstract:
This article is devoted to one particular case of using universal accelerated proximal envelopes to obtain computationally efficient accelerated versions of methods used to solve various optimization problem setups. In this paper, we propose a proximally accelerated coordinate descent method that achieves the efficient algorithmic complexity of iteration and allows one to take advantage of the pro…
▽ More
This article is devoted to one particular case of using universal accelerated proximal envelopes to obtain computationally efficient accelerated versions of methods used to solve various optimization problem setups. In this paper, we propose a proximally accelerated coordinate descent method that achieves the efficient algorithmic complexity of iteration and allows one to take advantage of the problem sparseness. An example of applying the proposed approach to optimizing a SoftMax-like function considered, for which the described method allowing weaken the dependence of the computational complexity on the dimension of the problem $n$ in $\mathcal{O}(\sqrt{n})$ times, and in practice demonstrates a faster convergence in comparison with standard methods.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Solving strongly convex-concave composite saddle point problems with a small dimension of one of the variables
Authors:
Egor Gladin,
Ilya Kuruzov,
Fedor Stonyakin,
Dmitry Pasechnyuk,
Mohammad Alkousa,
Alexander Gasnikov
Abstract:
The article is devoted to the development of algorithmic methods ensuring efficient complexity bounds for strongly convex-concave saddle point problems in the case when one of the groups of variables is high-dimensional, and the other is relatively low-dimensional (up to a hundred). The proposed technique is based on reducing problems of this type to a problem of minimizing a convex (maximizing a…
▽ More
The article is devoted to the development of algorithmic methods ensuring efficient complexity bounds for strongly convex-concave saddle point problems in the case when one of the groups of variables is high-dimensional, and the other is relatively low-dimensional (up to a hundred). The proposed technique is based on reducing problems of this type to a problem of minimizing a convex (maximizing a concave) functional in one of the variables, for which it is possible to find an approximate gradient at an arbitrary point with the required accuracy using an auxiliary optimization subproblem with another variable. In this case, the ellipsoid method is used for low-dimensional problems (if necessary, with an inexact $δ$-subgradient), and accelerated gradient methods are used for high-dimensional problems. For the case of a very small dimension of one of the groups of variables (up to 5), an approach based on a new version of the multidimensional analog of the Yu. E. Nesterov's method on the square (multidimensional dichotomy) is proposed with the possibility of using inexact values of the gradient of the objective functional.
△ Less
Submitted 25 October, 2022; v1 submitted 5 October, 2020;
originally announced October 2020.
-
Oracle Complexity Separation in Convex Optimization
Authors:
Anastasiya Ivanova,
Evgeniya Vorontsova,
Dmitry Pasechnyuk,
Alexander Gasnikov,
Pavel Dvurechensky,
Darina Dvinskikh,
Alexander Tyurin
Abstract:
Many convex optimization problems have structured objective function written as a sum of functions with different types of oracles (full gradient, coordinate derivative, stochastic gradient) and different evaluation complexity of these oracles. In the strongly convex case these functions also have different condition numbers, which eventually define the iteration complexity of first-order methods…
▽ More
Many convex optimization problems have structured objective function written as a sum of functions with different types of oracles (full gradient, coordinate derivative, stochastic gradient) and different evaluation complexity of these oracles. In the strongly convex case these functions also have different condition numbers, which eventually define the iteration complexity of first-order methods and the number of oracle calls required to achieve given accuracy. Motivated by the desire to call more expensive oracle less number of times, in this paper we consider minimization of a sum of two functions and propose a generic algorithmic framework to separate oracle complexities for each component in the sum. As a specific example, for the $μ$-strongly convex problem $\min_{x\in \mathbb{R}^n} h(x) + g(x)$ with $L_h$-smooth function $h$ and $L_g$-smooth function $g$, a special case of our algorithm requires, up to a logarithmic factor, $O(\sqrt{L_h/μ})$ first-order oracle calls for $h$ and $O(\sqrt{L_g/μ})$ first-order oracle calls for $g$. Our general framework covers also the setting of strongly convex objectives, the setting when $g$ is given by coordinate derivative oracle, and the setting when $g$ has a finite-sum structure and is available through stochastic gradient oracle. In the latter two cases we obtain respectively accelerated random coordinate descent and accelerated variance reduction methods with oracle complexity separation.
△ Less
Submitted 11 March, 2022; v1 submitted 7 February, 2020;
originally announced February 2020.
-
Inexact Relative Smoothness and Strong Convexity for Optimization and Variational Inequalities by Inexact Model
Authors:
Fedor Stonyakin,
Alexander Tyurin,
Alexander Gasnikov,
Pavel Dvurechensky,
Artem Agafonov,
Darina Dvinskikh,
Mohammad Alkousa,
Dmitry Pasechnyuk,
Sergei Artamonov,
Victorya Piskunova
Abstract:
In this paper, we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems, and variational inequalities. This framework allows obtaining many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, Bregman proximal methods. The…
▽ More
In this paper, we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems, and variational inequalities. This framework allows obtaining many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, Bregman proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows constructing new methods, which we illustrate by constructing a universal conditional gradient method and a universal method for variational inequalities with a composite structure. This method works for smooth and non-smooth problems with optimal complexity without a priori knowledge of the problem's smoothness. As a particular case of our general framework, we introduce relative smoothness for operators and propose an algorithm for variational inequalities (VIs) with such operators. We also generalize our framework for relatively strongly convex objectives and strongly monotone variational inequalities.
This paper is an extended and updated version of [arXiv:1902.00990]. In particular, we add an extension of relative strong convexity for optimization and variational inequalities.
△ Less
Submitted 19 December, 2021; v1 submitted 23 January, 2020;
originally announced January 2020.
-
Adaptive Catalyst for Smooth Convex Optimization
Authors:
Anastasiya Ivanova,
Dmitry Pasechnyuk,
Dmitry Grishchenko,
Egor Shulgin,
Alexander Gasnikov,
Vladislav Matyukhin
Abstract:
In this paper, we present a generic framework that allows accelerating almost arbitrary non-accelerated deterministic and randomized algorithms for smooth convex optimization problems. The main approach of our envelope is the same as in Catalyst (Lin et al., 2015): an accelerated proximal outer gradient method, which is used as an envelope for a non-accelerated inner method for the $\ell_2$ regula…
▽ More
In this paper, we present a generic framework that allows accelerating almost arbitrary non-accelerated deterministic and randomized algorithms for smooth convex optimization problems. The main approach of our envelope is the same as in Catalyst (Lin et al., 2015): an accelerated proximal outer gradient method, which is used as an envelope for a non-accelerated inner method for the $\ell_2$ regularized auxiliary problem. Our algorithm has two key differences: 1) easily verifiable stopping criteria for inner algorithm; 2) the regularization parameter can be tunned along the way. As a result, the main contribution of our work is a new framework that applies to adaptive inner algorithms: Steepest Descent, Adaptive Coordinate Descent, Alternating Minimization. Moreover, in the non-adaptive case, our approach allows obtaining Catalyst without a logarithmic factor, which appears in the standard Catalyst (Lin et al., 2015, 2018).
△ Less
Submitted 7 March, 2021; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Adaptive Mirror Descent for the Network Utility Maximization Problem
Authors:
Anastasiya Ivanova,
Fedor Stonyakin,
Dmitry Pasechnyuk,
Evgeniya Vorontsova,
Alexander Gasnikov
Abstract:
Network utility maximization is the most important problem in network traffic management. Given the growth of modern communication networks, we consider the utility maximization problem in a network with a large number of connections (links) that are used by a huge number of users. To solve this problem an adaptive mirror descent algorithm for many constraints is proposed. The key feature of the a…
▽ More
Network utility maximization is the most important problem in network traffic management. Given the growth of modern communication networks, we consider the utility maximization problem in a network with a large number of connections (links) that are used by a huge number of users. To solve this problem an adaptive mirror descent algorithm for many constraints is proposed. The key feature of the algorithm is that it has a dimension-free convergence rate. The convergence of the proposed scheme is proved theoretically. The theoretical analysis is verified with numerical simulations. We compare the algorithm with another approach, using the ellipsoid method (EM) for the dual problem. Numerical experiments showed that the performance of the proposed algorithm against EM is significantly better in large networks and when very high solution accuracy is not required. Our approach can be used in many network design paradigms, in particular, in software-defined networks.
△ Less
Submitted 11 December, 2019; v1 submitted 17 November, 2019;
originally announced November 2019.
-
Numerical methods for the resource allocation problem in networks
Authors:
Anastasiya Ivanova,
Dmitry Pasechnyuk,
Pavel Dvurechensky,
Alexander Gasnikov,
Evgeniya Vorontsova
Abstract:
In this paper, we consider the resource allocation problem in a network with a large number of connections which are used by a huge number of users. The resource allocation problem under discussion is a maximization problem with linear inequality constraints. To solve this problem we construct the dual problem and propose to use the following numerical optimization methods for the dual: a fast gra…
▽ More
In this paper, we consider the resource allocation problem in a network with a large number of connections which are used by a huge number of users. The resource allocation problem under discussion is a maximization problem with linear inequality constraints. To solve this problem we construct the dual problem and propose to use the following numerical optimization methods for the dual: a fast gradient method, a stochastic projected subgradient method, an ellipsoid method, and a random gradient extrapolation method. A special focus is made on the primal-dual analysis of these methods. For each method we estimate the convergence rate. We also provide some modifications of these methods in the setup of distributed computations, taking into account their application to networks.
△ Less
Submitted 8 February, 2021; v1 submitted 29 September, 2019;
originally announced September 2019.
-
Scheduling strategies for resource allocation in a cellular base station
Authors:
Dmitry Pasechnyuk
Abstract:
The problem of scheduling the (time) resource allocation of a base station (cell tower) that interacts with clients (users of wireless mobile devices with Internet access) and servers from which they download web pages (files in general) is studied.
The problem of scheduling the (time) resource allocation of a base station (cell tower) that interacts with clients (users of wireless mobile devices with Internet access) and servers from which they download web pages (files in general) is studied.
△ Less
Submitted 15 September, 2019;
originally announced September 2019.
-
Gradient Methods for Problems with Inexact Model of the Objective
Authors:
Fedor Stonyakin,
Darina Dvinskikh,
Pavel Dvurechensky,
Alexey Kroshnin,
Olesya Kuznetsova,
Artem Agafonov,
Alexander Gasnikov,
Alexander Tyurin,
César A. Uribe,
Dmitry Pasechnyuk,
Sergei Artamonov
Abstract:
We consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes $(δ,L)$ inexact oracle and relative smoothness condition. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. To show potent…
▽ More
We consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes $(δ,L)$ inexact oracle and relative smoothness condition. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. To show potential applications of our general framework we consider three particular problems. The first one is clustering by electorial model introduced in [Nesterov, 2018]. The second one is approximating optimal transport distance, for which we propose a Proximal Sinkhorn algorithm. The third one is devoted to approximating optimal transport barycenter and we propose a Proximal Iterative Bregman Projections algorithm. We also illustrate the practical performance of our algorithms by numerical experiments.
△ Less
Submitted 23 March, 2019; v1 submitted 24 February, 2019;
originally announced February 2019.
-
Inexact Model: A Framework for Optimization and Variational Inequalities
Authors:
Fedor Stonyakin,
Alexander Gasnikov,
Alexander Tyurin,
Dmitry Pasechnyuk,
Artem Agafonov,
Pavel Dvurechensky,
Darina Dvinskikh,
Alexey Kroshnin,
Victorya Piskunova
Abstract:
In this paper we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, proximal methods. The idea of t…
▽ More
In this paper we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows to construct new methods, which we illustrate by constructing a universal method for variational inequalities with composite structure. This method works for smooth and non-smooth problems with optimal complexity without a priori knowledge of the problem smoothness. We also generalize our framework for strongly convex objectives and strongly monotone variational inequalities.
△ Less
Submitted 5 January, 2020; v1 submitted 3 February, 2019;
originally announced February 2019.
-
One Method for Minimization a Convex Lipschitz-Continuous Function of 2 Variables on a Fixed Square
Authors:
Dmitry A. Pasechnyuk,
Fedor S. Stonyakin
Abstract:
In the article we have obtained some estimates of the rate of convergence for the recently proposed by Yu.E. Nesterov method of minimization of a convex Lipschitz-continuous function of two variables on a square with a fixed side. The method consists in solving auxiliary problems of one-dimensional minimization along the separating segments and does not imply the calculation of the exact value of…
▽ More
In the article we have obtained some estimates of the rate of convergence for the recently proposed by Yu.E. Nesterov method of minimization of a convex Lipschitz-continuous function of two variables on a square with a fixed side. The method consists in solving auxiliary problems of one-dimensional minimization along the separating segments and does not imply the calculation of the exact value of the gradient of the objective functional. Experiments have shown that the method under consideration can achieve the desired accuracy of solving the problem in less time than the other methods (gradient descent and ellipsoid method) considered, both in the case of a known exact solution and using estimates of the convergence rate of the methods.
△ Less
Submitted 13 January, 2020; v1 submitted 26 December, 2018;
originally announced December 2018.